dictionary - Scala / Spark: map in word reduce -
line = 123 123 123
how map work ? role of line ?
val tokenized =file.map(line=>(line.split("\t")(1),line.split("\t")(2).toint)) what tokenized ?
thanks
here informal explaination:
the map() method, when applied collection of 1 type of thing (e.g. collections of lines in file) , provided function (e.g. extract second , third item given string) return collection of results of applying function each item in original collection (e.g. collection of tuples containing second , third items each line).
the syntax
line=>(line.blah()) is shorthand defining function. input parameter being declared name 'line' , output result of evaluating expression. in expression, result second item on line string , third item integer (returned 'tuple').
here variation can paste scala interactive interpreter fakes file , splits line on spaces instead of tabs:
val file = list("111 222 333", "444 555 666") file: list[string] = list(111 222 333, 444 555 666)
val tokenized =file.map(line=>(line.split(" ")(1),line.split(" ")(2).toint)) tokenized: list[(string, int)] = list((222,333), (555,666))
so, here see result of type list[(string, int)]
Comments
Post a Comment