dictionary - Scala / Spark: map in word reduce -


line = 123 123 123

how map work ? role of line ?

val tokenized =file.map(line=>(line.split("\t")(1),line.split("\t")(2).toint)) 

what tokenized ?

thanks

here informal explaination:

the map() method, when applied collection of 1 type of thing (e.g. collections of lines in file) , provided function (e.g. extract second , third item given string) return collection of results of applying function each item in original collection (e.g. collection of tuples containing second , third items each line).

the syntax

line=>(line.blah()) 

is shorthand defining function. input parameter being declared name 'line' , output result of evaluating expression. in expression, result second item on line string , third item integer (returned 'tuple').

here variation can paste scala interactive interpreter fakes file , splits line on spaces instead of tabs:

val file = list("111 222 333", "444 555 666") 

file: list[string] = list(111 222 333, 444 555 666)

val tokenized =file.map(line=>(line.split(" ")(1),line.split(" ")(2).toint)) 

tokenized: list[(string, int)] = list((222,333), (555,666))

so, here see result of type list[(string, int)]


Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -