Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.25.0, 2.0.0-M2
-
None
Description
ERROR org.apache.nifi.provenance.index.lucene.EventIndexTask: Failed to index Provenance Events java.lang.IllegalArgumentException: Document contains at least one immense term in field="filename" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[49, 50, 55, 48, 54, 50, 51, 55, 51, 57, 51, 52, 53, 50, 56, 51, 53, 46, 48, 46, 97, 118, 114, 111, 46, 48, 46, 97, 118, 114]...', original message: bytes can be at most 32766 in length; got 74483 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208) at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415) at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1444) at org.apache.nifi.provenance.lucene.LuceneEventIndexWriter.index(LuceneEventIndexWriter.java:70) at org.apache.nifi.provenance.index.lucene.EventIndexTask.index(EventIndexTask.java:202) at org.apache.nifi.provenance.index.lucene.EventIndexTask.run(EventIndexTask.java:113) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 74483 at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:281) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:182) at org.apache.lucene.index.DefaultIndexingChain$PerField.
Looking at the code, it looks like filename is the only attribute that could be set with arbitrary values that is not protected against overly large values right now.
Attachments
Issue Links
- links to