Details
-
Bug
-
Status: Resolved
-
P3
-
Resolution: Won't Fix
-
2.13.0, 2.14.0, 2.16.0
-
None
-
Cloudera Express 6.2.0
Java Version: 1.8.0_181
Spark 2.4.0-cdh6.2.0
1 Master Node and 3 Data node(64 cores, 128GB RAM)
--driver-memory "2g" --num-executors "6" --executor-cores "3"
-
Important
Description
I am having source and target csv files with 10 million records and 250 columns. I am running an apache beam pipeline which joins all columns from source and target file. When I run this on spark cluster the pipeline executes correctly with no exceptions but, The join beam metrics counter returns double count when the following spark property is used. – executor-memory "2g" But, When I increase the excutor-memory to 11g then it returns the correct count.
Count doubles only when I dump the results to file but if I don't dump then counts are correct.