Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2221

[Format] Encoding spec incorrect for dictionary fallback

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • None
    • None
    • parquet-format
    • None

    Description

      The spec for DICTIONARY_ENCODING states that:

      If the dictionary grows too big, whether in size or number of distinct values, the encoding will fall back to the plain encoding.

      https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8

      However, the parquet-mr implementation was deliberately changed to a different fallback mechanism in https://issues.apache.org/jira/browse/PARQUET-52

      I'm assuming the parquet-mr implementation is authoritative here. But then the spec is incorrect and should be fixed to reflect expected behavior.

      Attachments

        Activity

          People

            Unassigned Unassigned
            apitrou Antoine Pitrou
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: