Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-2812

NPE when parsing text with write limit set on IBM JDK

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.20
    • None
    • core
    • IBM JDK 8

    Description

      We have updated Tika from version 1.14 to recently released 1.20 and are now experiencing an issue with parsing of texts when write limit is set (we are using WriteOutContentHandler) on IBM JDK 8.

      Test class TikaTest.java and test file test.txt are attached.

      The issue is present on IBM JDK 8 output-ibm-jdk-tika-1.20.txt, but not on Oracle output-oracle-jdk-tika-1.20.txt or Open JDK 8 output-open-jdk-tika-1.20.txt.

      With Tika 1.14 we had no this issue output-ibm-jdk-tika-1.14.txt.

      Analysis:
      With the fix in TIKA-2668 (https://github.com/apache/tika/commit/89a588e4d8d2aa44a9d3c965d514c18c7d3c134d#diff-5a28529cf32968d35a5036172cd8f74fL41) a line was removed from the constructor of the TaggedSAXException class:

      initCause(original); // SAXException has it's own chaining mechanism!
      

      Bringing the line back, solves our issue with JDK 8, but breaks the things on JDK 11 output-oracle-jdk-11-tika-1.20.txt.

      Is there any chance the class TaggedSAXException can be made compatible with JDK 8 and JDK 11 (both Oracle/OpenJDK and IBM one)?

      Thank you in advance!

      Kind regards
      Sergiy Shyrkov

      Attachments

        1. TikaTest.java
          2 kB
          Sergiy Shyrkov
        2. test.txt
          0.0 kB
          Sergiy Shyrkov
        3. output-oracle-jdk-tika-1.20.txt
          0.3 kB
          Sergiy Shyrkov
        4. output-oracle-jdk-11-tika-1.20.txt
          4 kB
          Sergiy Shyrkov
        5. output-open-jdk-tika-1.20.txt
          0.3 kB
          Sergiy Shyrkov
        6. output-ibm-jdk-tika-1.20.txt
          0.9 kB
          Sergiy Shyrkov
        7. output-ibm-jdk-tika-1.14.txt
          0.3 kB
          Sergiy Shyrkov

        Activity

          People

            tallison Tim Allison
            shyrkov Sergiy Shyrkov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: