Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-2069

Parquet file containing arrays, written by Parquet-MR, cannot be read again by Parquet-MR

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Blocker
    • Resolution: Unresolved
    • 1.12.0
    • None
    • parquet-avro
    • None
    • Windows 10

    Description

      In the attached files, there is one original file, and one written modified file that results after reading the original file and writing it back with Parquet-MR, with a few values modified. The schema should not be modified, since the schema of the input file is used as the schema to write the output file. However, the output file has a slightly modified schema that then cannot be read back the same way again with Parquet-MR, resulting in the exception message:  java.lang.ClassCastException: optional binary element (STRING) is not a group

      My guess is that the issue lies in the Avro schema conversion.

      The Parquet files attached have some arrays and some nested fields.

      Attachments

        1. modified.parquet
          4 kB
          Devon Kozenieski
        2. original.parquet
          3 kB
          Devon Kozenieski
        3. parquet-diff.png
          282 kB
          Timothy Miller

        Issue Links

          Activity

            People

              Unassigned Unassigned
              devonk Devon Kozenieski
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: