Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
json.facet refinement is currently "pessimistic" by default. Specifically: "Long Tail" terms that may not be in the "top n" on every shard, but are in the "top n + overrequest" for at least 1 shard aren't getting refined and included in the aggregated response in some cases.
This is different then the "optimistic" approach taken in the existing facet.field and facet.pivot refinement, that refines all known terms whose counts might be high enough to put them in the topN based on what's known about the lowest count returned by each shard in phase #1.
A mitigating option that people with particular concerns about long tail terms can consider is to set a "high" value for the overrefine parameter – forcing Solr to refine more terms from phase#1 – but this is somewhat of a "brute force" workaround, since it doesn't take into account any known info about the results of each shard from phase#1.
This issue tracks possible improvements that could be made to the faceting code to be more sophisticated.
(NOTE: this Jira was originally filed as a bug report noting that json.facet refinement didn't seem to be working properly compared to facet.field refinement, and early comments are written in this mindset)
Attachments
Issue Links
- is related to
-
SOLR-11729 Increase default overrequest ratio/count in json.facet to match existing defaults for facet.overrequest.ratio & facet.overrequest.count ?
- Closed
- relates to
-
SOLR-12343 JSON Field Facet refinement can return incorrect counts/stats for sorted buckets
- Closed