Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19479

encoded stream seek is incorrect for 0-length RGs in LLAP IO

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.0.0, 3.1.0
    • Component/s: None
    • Labels:
      None

      Description

      The PositionProvider offset is not updated correctly and an error like this may happen:

      Caused by: java.lang.IllegalArgumentException: Seek in LENGTH to 541 is outside of the data
      	at org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:161)
      	at org.apache.orc.impl.InStream$UncompressedStream.seek(InStream.java:123)
      	at org.apache.orc.impl.RunLengthIntegerReaderV2.seek(RunLengthIntegerReaderV2.java:331)
      	at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:298)
      	at org.apache.hadoop.hive.ql.io.orc.encoded.EncodedTreeReaderFactory$StringStreamReader.seek(EncodedTreeReaderFactory.java:258)
      	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.repositionInStreams(OrcEncodedDataConsumer.java:250)
      	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:134)
      	at org.apache.hadoop.hive.llap.io.decode.OrcEncodedDataConsumer.decodeBatch(OrcEncodedDataConsumer.java:62)
      

      We found this happens when ORC writes a strange stream combination - data stream for a RG has no values (the rows all have nulls), but there are values (0-s) in length stream for the same rows. That is technically a valid ORC file, although writing the 0s is completely useless.
      This may be fixed separately in ORC, but since these files now exist in the wild we should handle them correctly.

        Attachments

        1. HIVE-19479.patch
          12 kB
          Sergey Shelukhin
        2. HIVE-19479.01.patch
          12 kB
          Sergey Shelukhin

          Issue Links

            Activity

              People

              • Assignee:
                sershe Sergey Shelukhin
                Reporter:
                sershe Sergey Shelukhin
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: