[SOLR-6248] MoreLikeThis Query Parser - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.0, 6.0
Component/s: query parsers
Labels:
None

Description

MLT Component doesn't let people highlight/paginate and the handler comes with an cost of maintaining another piece in the config. Also, any changes to the default (number of results to be fetched etc.) /select handler need to be copied/synced with this handler too.

Having an MLT QParser would let users get back docs based on a query for them to paginate, highlight etc. It would also give them the flexibility to use this anywhere i.e. q,fq,bq etc.

A bit of history about MLT (thanks to Hoss)

MLT Handler pre-dates the existence of QParsers and was meant to take an arbitrary query as input, find docs that match that
query, club them together to find interesting terms, and then use those
terms as if they were my main query to generate a main result set.

This result would then be used as the set to facet, highlight etc.

The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList(y)

The MLT component on the other hand solved a very different purpose of augmenting the main result set. It is used to get similar docs for each of the doc in the main result set.

DocSet(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)

The new approach:

All of this can be done better and cleaner (and makes more sense too) using an MLT QParser.

An important thing to handle here is the case where the user doesn't have TermVectors, in which case, it does what happens right now i.e. parsing stored fields.

Also, in case the user doesn't have a field (to be used for MLT) indexed, the field would need to be a TextField with an index analyzer defined. This analyzer will then be used to extract terms for MLT.

In case of SolrCloud mode, '/get-termvectors' can be used after looking at the schema (if TermVectors are enabled for the field). If not, a /get call can be used to fetch the field and parse it.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-6248.patch
04/Nov/14 23:48
5 kB
Anshum Gupta
SOLR-6248.patch
28/Oct/14 08:27
27 kB
Anshum Gupta
SOLR-6248.patch
28/Oct/14 06:50
27 kB
Anshum Gupta
SOLR-6248.patch
26/Oct/14 20:52
25 kB
Anshum Gupta
SOLR-6248.patch
23/Oct/14 05:09
47 kB
Anshum Gupta
SOLR-6248.patch
23/Jul/14 21:35
71 kB
Vitaliy Zhovtyuk
SOLR-6248-4x.patch
14/Jan/15 13:37
32 kB
Markus Jelsma
SOLR-6248-4x.patch
13/Jan/15 12:44
31 kB
Markus Jelsma

Issue Links

is related to

SOLR-7883 MoreLikeThis is incompatible with facets

Closed

SOLR-16420 MoreLikeThis Content Query Parser

Closed

SOLR-5480 Make MoreLikeThisHandler distributable

Open

relates to

SOLR-7913 Add stream.body support to MLT QParser

Resolved

Activity

People

Assignee:: Anshum Gupta

Reporter:: Anshum Gupta

Votes:: 1 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 15/Jul/14 17:20

Updated:: 20/Sep/22 09:10

Resolved:: 12/Jan/15 23:57