Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22670

ArrayIndexOutOfBoundsException when vectorized reader is used for reading a parquet file

    XMLWordPrintableJSON

Details

    Description

      ArrayIndexOutOfBoundsException is getting thrown while decoding dictionaryIds of a row group in parquet file with vectorization enabled. 

      Exception stack trace:

      Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
       at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.decodeToBinary(PlainValuesDictionary.java:122)
       at org.apache.hadoop.hive.ql.io.parquet.vector.ParquetDataColumnReaderFactory$DefaultParquetDataColumnReader.readString(ParquetDataColumnReaderFactory.java:95)
       at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.decodeDictionaryIds(VectorizedPrimitiveColumnReader.java:467)
       at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedPrimitiveColumnReader.readBatch(VectorizedPrimitiveColumnReader.java:68)
       at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:410)
       at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:353)
       at org.apache.hadoop.hive.ql.io.parquet.vector.VectorizedParquetRecordReader.next(VectorizedParquetRecordReader.java:92)
       at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
       ... 24 more

       

      This issue seems to be caused by re-using the same dictionary column vector while reading consecutive row groups. This looks like one of the corner case bug which occurs for a certain distribution of dictionary/plain encoded data while we read/populate the underlying bit packed dictionary data into a column-vector based data structure. 

      Similar issue issue was reported in spark (Ref: https://issues.apache.org/jira/browse/SPARK-16334)

      Attachments

        1. HIVE-22670.1.patch
          8 kB
          Ganesha Shreedhara
        2. HIVE-22670.2.patch
          9 kB
          Ganesha Shreedhara

        Issue Links

          Activity

            People

              ganeshas Ganesha Shreedhara
              ganeshas Ganesha Shreedhara
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h