Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-555

IllegalArgumentException when reading files with compressed footers bigger than 16k

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.6.1, 1.7.0
    • None
    • None

    Description

      I am using orc-core::nohive to read an ORC file which was generated using an older version of ORC (probably through Hive 1.1). I am unable to read this file since ORC 1.6 and am able to read it in 1.5.5.

      Code:

      final Reader orcReader = OrcFile.createReader(new Path("/Users/smahadik/orcFailure.orc"),
          OrcFile.readerOptions(new Configuration()));
      System.out.println(orcReader.getNumberOfRows());
      

      Stacktrace:

      java.io.IOException: Problem reading file footer /Users/smahadik/orcFailure.orc
      
      	at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:716)
      	at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:500)
      	at org.apache.orc.OrcFile.createReader(OrcFile.java:365)
      	at example.testFileFooterReadFailure(TestOrcMetrics.java:16)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      	at java.lang.reflect.Method.invoke(Method.java:498)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
      	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
      	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
      	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
      	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
      	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
      Caused by: java.lang.IllegalArgumentException
      	at java.nio.Buffer.position(Buffer.java:244)
      	at org.apache.orc.impl.InStream$CompressedStream.setCurrent(InStream.java:453)
      	at org.apache.orc.impl.InStream$CompressedStream.reset(InStream.java:440)
      	at org.apache.orc.impl.InStream$CompressedStream.<init>(InStream.java:426)
      	at org.apache.orc.impl.InStream.create(InStream.java:843)
      	at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:706)
      	... 25 more
      

      Unfortunately I cannot share the data file for the failure. I am not really familiar with the ORC codebase so not sure what is actually happening here. I will try to dig more though if I can find any more information.

      Here's what I know so far. The error occurs at https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/InStream.java#L453 because the compressed limit is less than the position it is trying to set. It is going through this if condition in ReaderImpl which was changed recently https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L691
      The extra value is around 3k so the code seems to switch the original buffer of limit 16k to new buffer of limit 3k. This smaller buffer is passed to https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L706 and it fails eventually.

      Values of some variables at line 706
      size = 309950950
      readSize = 16384
      psLen = 26
      psOffset = 309950923
      tailSize = 20314
      footerSize = 3650
      metadataSize = 16637
      extra = 3930
      buffer = data range [309930636, 309934566), size: 3930 type: array-backed
      buffer.next = data range [309934566, 309950950), size: 16384 type: array-backed
      stripeStatSize = 0

      Does anyone have any insights/intuition about what might be happening and how we can debug this?

      Attachments

        Issue Links

          Activity

            People

              omalley Owen O'Malley
              shardulm Shardul Mahadik
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m