Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-11271

[Rust] [Parquet] List schema to Arrow parser misinterpreting child nullability

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 3.0.0
    • Rust

    Description

      We currently do not propagate child nullability correctly when reading parquet files from Spark 3.0.1 (parquet-mr 1.10.1).

      For example, the below taken from https://github.com/apache/parquet-format/blob/master/LogicalTypes.md is currently interpreted incorrectly:

       

      // List<String> (list nullable, elements non-null) 
      optional group my_list (LIST) {
          repeated group list { 
              required binary element (UTF8); 
          } 
      }

      The Arrow type should be:

      Field::new(
          "my_list",
          DataType::List(
              box Field::new("element", DataType::Utf8, nullable: false),
          ),
          nullable: true
      )

      but we currently end up with 

      Field::new(
         "my_list",
         DataType::List(
             box Field::new("list", DataType::Utf8, nullable: true),
         ),
         nullable: true
      )
      

      This doesn't seem to be an issue with the master branch as of opening this issue, so it might not be severe enough to try force into the 3.0.0 release.

      I tested null and non-null Spark files, and was able to read them correctly. This becomes an issue with nested lists, which I'm working on.

       

      Attachments

        Issue Links

          Activity

            People

              nevi_me Neville Dipale
              nevi_me Neville Dipale
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m