Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42696

Speed up parquet reading with Java Vector API

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.5.0
    • None
    • Input/Output
    • None

    Description

      Parquet has supported use Java 17 Vector API to perform bit-unpacking to enjoy 4x ~ 8x performance gain in microbenchmark.

      I have finished the TPC-H(SF100) benchmark with spark integrated parquet optimization, each SQL has a different performance gain,  Q6 can reach up 11%

       

      Please assign it to me, I will summit a PR, thanks!

      Attachments

        Activity

          People

            jiangjiguang0719 jiangjiguang0719
            jiangjiguang0719 jiangjiguang0719
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: