scala - How to transpose an RDD in Spark -
i have rdd this:
1 2 3 4 5 6 7 8 9
it matrix. want transpose rdd this:
1 4 7 2 5 8 3 6 9
how can this?
say have n×m matrix.
if both n , m small can hold n×m items in memory, doesn't make sense use rdd. transposing easy:
val rdd = sc.parallelize(seq(seq(1, 2, 3), seq(4, 5, 6), seq(7, 8, 9))) val transposed = sc.parallelize(rdd.collect.toseq.transpose)
if n or m large cannot hold n or m entries in memory, cannot have rdd line of size. either original or transposed matrix impossible represent in case.
n , m may of medium size: can hold n or m entries in memory, cannot hold n×m entries. in case have blow matrix , put again:
val rdd = sc.parallelize(seq(seq(1, 2, 3), seq(4, 5, 6), seq(7, 8, 9))) // split matrix 1 number per line. val bycolumnandrow = rdd.zipwithindex.flatmap { case (row, rowindex) => row.zipwithindex.map { case (number, columnindex) => columnindex -> (rowindex, number) } } // build transposed matrix. group , sort column index first. val bycolumn = bycolumnandrow.groupbykey.sortbykey().values // sort row index. val transposed = bycolumn.map { indexedrow => indexedrow.toseq.sortby(_._1).map(_._2) }
Comments
Post a Comment