Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16652

multi-term synonym rule applied at query time prevents single-term matching

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 9.1
    • None
    • query parsers
    • None

    Description

      The presence of a multi-term synonym equivalence rule applied at query time prevents matching on individual terms in the synonym.

      If we issue an edismax query against a text_general field in Solr 9.1, and the query string is "foo bar," we can match documents that have "foo" without "bar" and vice versa. However, if there is a synonym rule like "foo bar,baz" applied at query time, we no longer get single-term matches against "foo" or "bar." Both terms are now required, but can occur in any position: a document can match the query if it contains "foo bar" or "bar foo" or "bar qux foo", for example, but not if it only contains "foo".

      However, if we change the text_general analysis chain to apply synonyms at index time, the observed behavior also changes and single-term matches for "foo" or "bar" are again possible.

      Why is this an issue? 1) it is counterintuitive that a synonym equivalence (as opposed to a unidirectional mapping) would give narrower recall than without the rule, 2) this behavior represents a discrepancy in semantics between index-time and query-time synonym expansion.

       

      STEPS TO REPRODUCE

      Use the _default configset with "foo bar,baz" added to synonyms.txt. Index these four docs:

       

      {"id":"1", "title_txt":"foo"}

       

      {"id":"2", "title_txt":"bar"}

       

      {"id":"3", "title_txt":"foo bar"}

       

      {"id":"4", "title_txt":"bar foo"}

       

       
      Issue a query for "foo bar" (i.e. defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
      Result: Only docs 3 and 4 come back
       
      Issue a query for "bar foo"
      Result: All four docs come back; the synonym rule is not invoked
       

      OBSERVATIONS

      Note that we could change the synonym rule to "foo bar,baz,foo,bar" but this would mean that a query for "foo" could now match a document containing only "bar", which is not the intent of the original rule.

      Note that we could set sow=true but this would prevent the multi-term synonym from taking effect: the "foo bar" query could now get single-term matches on "foo" or "bar" but couldn't get a match on the synonym "baz"
       
      Returning to the original "foo bar,baz" synonym rule with sow=false, if we look at the explain output for the "foo bar" query we see:

      +((title_txt:baz (+title_txt:foo +title_txt:bar)))
       
      Looking at the explain output for "bar foo" we see:

      +((title_txt:bar) (title_txt:foo))
       
      So, the observed behavior makes sense according to the low-level query structure, but is still counterintuitive for the reasons described above.
       
      Why not expand the "foo bar" query like this instead?
       
      +((title_txt:baz (title_txt:foo title_txt:bar)))
       

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            rseitz Rudi Seitz
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 40m
                40m