Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12958

Statistical Phrase Identifier should return phrases in single field

    XMLWordPrintableJSON

Details

    Description

      It has come to my attention that the phrase identifier introduced in SOLR-9418 does not return phrases that are found in only one of the fields specified by phrases.fields.
      This has proved troublesome for our use case.
      The offending line seems to be

      final List<Phrase> validScoringPhrasesSorted = contextData.allPhrases.stream()
        .filter(p -> 0.0D < p.getTotalScore())
        .sorted(Comparator.comparing((p -> p.getTotalScore()), Collections.reverseOrder()))
        .collect(Collectors.toList());

      Since fields where the phrase is not present return -1.0, and fields that contain the phrase return a score in the range of 0.0 <= score >= 1.0, the total score turn out negative, and the phrase gets filtered.
      I changed separated the filters to 2 distinct cases:

      1. Filter out single word phrases (phrases.singleWordPhrases is set to false)
      2. Include single word phrases (phrases.singleWordPhrases is set to true)

      This can be observed by this change to the component's logid:

      if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
            // filter single word phrases
            phraseStream = contextData.allPhrases.stream()
                .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> fieldScore > 0.0D));
      } else {
            // include single word phrases, which return a constant score of 0.0
            phraseStream = contextData.allPhrases.stream()
                .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> fieldScore >= 0.0D));
      }

      Attachments

        1. SOLR-12958.patch
          9 kB
          mosh

        Activity

          People

            Unassigned Unassigned
            moshebla mosh
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: