Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-9123

Error: Document contains at least one immense term

    XMLWordPrintableJSON

Details

    Description

      11:35:09.400 [I/O dispatcher 1] ERROR o.a.j.o.p.i.e.i.ElasticIndexWriter - Bulk item with id /wikipedia/76/84/National Palace (Mexico) failed
      org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field="text.keyword" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[123, 123, 73, 110, 102, 111, 98, 111, 120, 32, 104, 105, 115, 116, 111, 114, 105, 99, 32, 98, 117, 105, 108, 100, 105, 110, 103, 10, 124, 110]...', original message: bytes can be at most 32766 in length; got 33409]
      at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
      at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
      at org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:138)
      at org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:196)
      at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1888)
      at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1676)
      at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1758)
      at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:590)
      at org.elasticsearch.client.RestClient$1.completed(RestClient.java:333)
      at org.elasticsearch.client.RestClient$1.completed(RestClient.java:327)
      at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
      at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181)
      at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448)
      at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338)
      at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
      at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
      at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
      at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
      at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
      at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
      at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
      at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
      at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
      at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
      at java.lang.Thread.run(Thread.java:745)
      Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=bytes can be at most 32766 in length; got 33409]
      at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
      at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
      at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
      ... 24 common frames omitted

      This happens with huge keyword fields since Lucene doesn't allow terms with more than 32k bytes.

      See https://discuss.elastic.co/t/error-document-contains-at-least-one-immense-term-in-field/66486

      We have decided to always create keyword fields to remove the need to specify properties like ordered or facet. In this way every field can be sorted or used as facet.

      In this specific case the keyword field won't be needed at all but it would be hard to decide when include it or not. To solve this we are going to use `ignore_above=256` so huge keyword fields will be ignored.

      Attachments

        Activity

          People

            fortino Fabrizio Fortino
            fortino Fabrizio Fortino
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: