Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2222

[Format] RLE encoding spec incorrect for v2 data pages

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • format-2.10.0
    • parquet-format
    • None

    Description

      The spec (https://github.com/apache/parquet-format/blob/master/Encodings.md#run-length-encoding--bit-packing-hybrid-rle--3) has this:

      rle-bit-packed-hybrid: <length> <encoded-data>
      length := length of the <encoded-data> in bytes stored as 4 bytes little endian (unsigned int32)
      

      But the length is actually prepended only in v1 data pages, not in v2 data pages.

      Attachments

        Activity

          People

            mwish Xuwei Fu
            apitrou Antoine Pitrou
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: