Details
-
Wish
-
Status: Resolved
-
Major
-
Resolution: Won't Fix
-
2.4.0
-
None
-
None
Description
When we run action DataFrame.collect() , for the configuration *spark.driver.maxResultSize ,*when determine if the returned data exceeds the limit, it will use the compressed byte array's size, it is not accurate.
Since when we get data when use SparkThriftServer, when not use incremental colletion. It will get all data of datafrme for each partition.
For return data, it has the preocess"
- compress data's byte array
- Being packaged as ResultSet
- return to driver and judge by spark.Driver.resultMaxSize
- decode(uncompress) data as Array[Row]
The amount of data unzipped differs significantly from the amount of data unzipped, The difference in the size of the data is more than ten times
Attachments
Issue Links
- is duplicated by
-
SPARK-28761 spark.driver.maxResultSize only applies to compressed data
- Resolved
- links to