Details
-
Question
-
Status: Resolved
-
Major
-
Resolution: Invalid
-
3.1.2
-
None
-
None
Description
Hi All,
We started to switch our Spark pipelines to read parquet with ZSTD compression.
After the switch we see that memory footprint is much larger than previously with SNAPPY.
Additionally GC stats of the jobs are much higher comparing to SNAPPY with the same workload as previously.
Is there any configurations that may be relevant to read path, that may help in such cases ?