Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.20
-
None
-
IBM JDK 8
Description
We have updated Tika from version 1.14 to recently released 1.20 and are now experiencing an issue with parsing of texts when write limit is set (we are using WriteOutContentHandler) on IBM JDK 8.
Test class TikaTest.java and test file test.txt are attached.
The issue is present on IBM JDK 8 output-ibm-jdk-tika-1.20.txt, but not on Oracle output-oracle-jdk-tika-1.20.txt or Open JDK 8 output-open-jdk-tika-1.20.txt.
With Tika 1.14 we had no this issue output-ibm-jdk-tika-1.14.txt.
Analysis:
With the fix in TIKA-2668 (https://github.com/apache/tika/commit/89a588e4d8d2aa44a9d3c965d514c18c7d3c134d#diff-5a28529cf32968d35a5036172cd8f74fL41) a line was removed from the constructor of the TaggedSAXException class:
initCause(original); // SAXException has it's own chaining mechanism!
Bringing the line back, solves our issue with JDK 8, but breaks the things on JDK 11 output-oracle-jdk-11-tika-1.20.txt.
Is there any chance the class TaggedSAXException can be made compatible with JDK 8 and JDK 11 (both Oracle/OpenJDK and IBM one)?
Thank you in advance!
Kind regards
Sergiy Shyrkov