Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47888

Spark supporting Parquet V2

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • None

    Description

      Parquet V2 encoding made the parquet optimized. Read and write is now faster than Parquet V1.  We are using Dremio for reading and writing Parquet . Now due to Parquet V2 encoding read is 75% faster and write is 25% faster.

       

      Apache Parquet-MR Writer version PARQUET_2_0, which is widely adopted by engines that write Parquet data, supports delta encodings. However, these encodings were not previously supported by Dremio's vectorized Parquet reader, resulting in decreased speed. Now, in version 24.3 and Dremio Cloud, when you use the Dremio SQL query engine on Parquet datasets, you’ll receive best-in-class performance.

      https://www.dremio.com/blog/vectorized-reading-of-parquet-v2-improves-performance-up-to-75/

      Attachments

        Activity

          People

            Unassigned Unassigned
            premsahoo PremSahoo
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: