Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-2304

MoreLikeThis: Apply field level boosts before query terms are selected

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.4.2
    • 4.9, 6.0
    • MoreLikeThis
    • None

    Description

      MoreLikeThis provides the ability to set field level boosts to weight the importance of fields in selecting similar documents. Currently, in trunk, these field level boosts are applied after the query terms have been selected from the priority queue of interesting terms in MoreLIkeThis. This can give unexpected results when used in combination with mlt.maxqt to limit the number of query terms. For example, if you use fields fieldA and fieldB and boost them "fieldA^0.5 fieldB^2.0" with a maxqt parameter of 20, if the terms in fieldA have relatively higher tf-idf scores than fieldB, only 20 fieldA terms will be selected as the basis for the MoreLikeThis query... even if after boosting, there are terms in fieldB with a higher overall score.

      I encountered this while using document descriptive text and document tags (comedy, action, etc) as the basis for MoreLIkeThis. I wanted to boost the tags higher, however the less common document text terms were always selected as the query terms while the more common tag terms were eliminated by the maxqt parameter before their scores were boosted.

      I believe the code was originally written as it was so that the bulk of the work could be done in the MoreLikeThisHandler without modifying the MoreLikeThis class in the lucene project. Now that the projects are merged, I think this modification makes sense. I will be attaching a simple patch to trunk.

      Attachments

        1. SOLR-2304.patch
          5 kB
          Mike Mattozzi

        Activity

          People

            Unassigned Unassigned
            mmattozzi Mike Mattozzi
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: