Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-15260

Precompute snippet delimiter breaks for the UnifiedHighlighter

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • highlighter
    • None

    Description

      The "BreakIterator" implementation inside the UnifiedHighlighter can play a significant role in the performance of highlighting. The default ones are based in the JDK and thus we don't have control over them but they may very well be optimized but have a complicated job to do. I propose that the break locations be computed at indexing time in a Solr UpdateRequestProcessor and place them into a pre analyzed common field named maybe _highlighter_breaks_ that needs indexed=true plus offsets. In this field, the term is the actual field name, the position is meaningless, and the offset pair refers to the span of the break iterator (typically a sentence). This data can be efficiently stored in Lucene. The UnifiedHighlighter already has a flexible BreakIterator producer but it's not notified of the current document, and so changes would be needed there (separate LUCENE issue).

      Attachments

        Activity

          People

            Unassigned Unassigned
            dsmiley David Smiley
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: