Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29454

Reduce unsafeProjection call times when read parquet file

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.2.3, 2.3.4, 2.4.4
    • 3.0.0
    • SQL
    • None

    Description

      ParquetGroupConverter call unsafeProjection function to covert SpecificInternalRow to UnsafeRow every times when read Parquet data file use ParquetRecordReader, then ParquetFileFormat will call unsafeProjection function to covert this UnsafeRow to another UnsafeRow again when partitionSchema is not empty , and on the other hand PartitionReaderWithPartitionValues  always do this convert process when use DataSourceV2.

      I think the first time convert in ParquetGroupConverter is redundant and ParquetRecordReader return a SpecificInternalRow is enough.

      Attachments

        Issue Links

          Activity

            People

              LuciferYang Yang Jie
              LuciferYang Yang Jie
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: