Details
-
Question
-
Status: Resolved
-
Critical
-
Resolution: Invalid
-
2.3.1
-
None
-
None
Description
Hi,
I have a use case , where I have a SparkR dataframe and i want to iterate over the dataframe in a for loop using the row numbers of the dataframe. Is it possible?
Only solution I have now is to collect() the SparkR dataframe in R dataframe , which brings the entire dataframe on Driver node and then iterate over it using row numbers. But as the for loop executes only on driver node, I don't get the advantage of parallel processing in Spark which was the whole purpose of using Spark. Please Help.
Thank You,
Asif Khan