Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2435

Clarify behavior of DELTA_BINARY_PACKED encoders/decoders

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • format-2.11.0
    • parquet-format
    • None

    Description

      I brought this issue up on some time ago on the mailing list [1]; in short I would like to add some clarification to the DELTA_BINARY_PACKED section of Encodings.md.  The issue is that while the specification does not limit the number of bits that can be used to encode deltas, some readers expect a maximum of 32 bits for INT32 data, and 64 bits for INT64 data [2]. I propose adding verbiage to the specification to the effect that while using 33 bits to encode INT32 data (or 65 for INT64), it is not recommended, and that readers should be able to read such data, but are not required to.

       

       

      [1] https://lists.apache.org/thread/2wj88oghc0t6qqj8ojp5p5tf8wg11840

      [2] https://github.com/apache/arrow/issues/20374

      Attachments

        Issue Links

          Activity

            People

              etseidl Edward Seidl
              etseidl Edward Seidl
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: