Uploaded image for project: 'Parquet'
  1. Parquet
  2. PARQUET-1887

Exception thrown by AvroParquetWriter#write causes all subsequent calls to it to fail

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.11.0, 1.8.3
    • None
    • parquet-avro
    • None

    Description

      Please see sample code below:

      Schema schema = new Schema.Parser().parse("""
              {
                "type": "record",
                "name": "person",
                "fields": [
                  {
                    "name": "address",
                    "type": [
                      "null",
                      {
                        "type": "array",
                        "items": "string"
                      }
                    ],
                    "default": null
                  }
                ]
              }
              """
      );
      
      ParquetWriter<GenericRecord> writer = AvroParquetWriter.<GenericRecord>builder(new org.apache.hadoop.fs.Path("/tmp/person.parquet"))
              .withSchema(schema)
              .build();
      
      try {
          // To trigger exception, add array with null element.
          writer.write(new GenericRecordBuilder(schema).set("address", Arrays.asList("first", null, "last")).build());
      } catch (Exception e) {
          e.printStackTrace(); // "java.lang.NullPointerException: Array contains a null element at 1"
      }
      
      try {
          // At this point all future calls to writer.write will fail
          writer.write(new GenericRecordBuilder(schema).set("address", Arrays.asList("foo", "bar")).build());
      } catch (Exception e) {
          e.printStackTrace(); // "org.apache.parquet.io.InvalidRecordException: 1(r) > 0 ( schema r)"
      }
      
      writer.close();
      

      It seems to me this is caused by state not being reset between writes. Is this the indented behavior of the writer? And if so, does one have to create a new writer whenever a write fails?

      I'm able to reproduce this using both parquet 1.8.3 and 1.11.0, and have attached a sample parquet file for each version.

      Attachments

        1. person1_11_0.parquet
          0.3 kB
          Øyvind Strømmen
        2. person1_8_3.parquet
          0.3 kB
          Øyvind Strømmen

        Activity

          People

            Unassigned Unassigned
            InsulaVentus Øyvind Strømmen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: