Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12808

Wrong highlighting using PatternReplaceCharFilterFactory

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 7.2.1, 7.4, 7.5
    • None
    • highlighter
    • None
    • Java: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_162 25.162-b12
      OS: Linux Debian 8.11

    Description

      Hi,
      the default highlighter seems to work badly in conjunction with PatternReplaceCharFilterFactory.

      My query is: verb_esame_num_tnv:(00031665 0035 9)

      The field type used by the field "verb_esame_num_tnv" is:

      <fieldType name="text_num_verbale" class="solr.TextField" positionIncrementGap="100">
         <analyzer>
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="^0*([0-9]+\s+[0-9]+\s+[0-9]+)$" replacement=" $1"/>
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/>
            <tokenizer class="solr.StandardTokenizerFactory"/>
         </analyzer>
      </fieldType>
      

      I've attached a screenshot of the text analysis.

      It seems that the highlighter uses the wrong offsets in the original text to highligth the matched tokens.

      Hope this helps.

      Regards.

      Attachments

        1. text_analysis.png
          91 kB
          Federico Grillini

        Activity

          People

            Unassigned Unassigned
            f.grillini@kion.it Federico Grillini
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: