Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.0
-
None
-
None
Description
At the moment the Parquet vectorized reader will eagerly decode all the columns that are in the read schema, before any filter has been applied to them. This is costly. Instead it's better to only materialize these column vectors when the data are actually needed.
Attachments
Issue Links
- is related to
-
SPARK-35743 Improve Parquet vectorized reader
- Resolved
- relates to
-
SPARK-42256 SPIP: Lazy Materialization for Parquet Read Performance Improvement
- Open
-
SPARK-25643 Performance issues querying wide rows
- Open