Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-12850

Failure to index Provenance Events with large filename attribute

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.25.0, 2.0.0-M2
    • 2.0.0-M3, 1.26.0
    • Core Framework
    • None

    Description

      ERROR org.apache.nifi.provenance.index.lucene.EventIndexTask: Failed to index Provenance Events java.lang.IllegalArgumentException: Document contains at least one immense term in field="filename" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[49, 50, 55, 48, 54, 50, 51, 55, 51, 57, 51, 52, 53, 50, 56, 51, 53, 46, 48, 46, 97, 118, 114, 111, 46, 48, 46, 97, 118, 114]...', original message: bytes can be at most 32766 in length; got 74483 at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:984) at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:527) at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:491) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:208) at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:415) at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1471) at org.apache.lucene.index.IndexWriter.addDocuments(IndexWriter.java:1444) at org.apache.nifi.provenance.lucene.LuceneEventIndexWriter.index(LuceneEventIndexWriter.java:70) at org.apache.nifi.provenance.index.lucene.EventIndexTask.index(EventIndexTask.java:202) at org.apache.nifi.provenance.index.lucene.EventIndexTask.run(EventIndexTask.java:113) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 74483 at org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:281) at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:182) at org.apache.lucene.index.DefaultIndexingChain$PerField. 

      Looking at the code, it looks like filename is the only attribute that could be set with arbitrary values that is not protected against overly large values right now.

      Attachments

        Issue Links

          Activity

            People

              pvillard Pierre Villard
              pvillard Pierre Villard
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 50m
                  50m