Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13807

Caching for term facet counts

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 8.2, 9.0
    • None
    • Facet Module
    • None

    Description

      Solr does not have a facet count cache; so for every request, term facets are recalculated for every (facet) field, by iterating over every field value for every doc in the result domain, and incrementing the associated count.

      As a result, subsequent requests end up redoing a lot of the same work, including all associated object allocation, GC, etc. This situation could benefit from integrated caching.

      Because of the domain-based, serial/iterative nature of term facet calculation, latency is proportional to the size of the result domain. Consequently, one common/clear manifestation of this issue is high latency for faceting over an unrestricted domain (e.g., *:*), as might be observed on a top-level landing page that exposes facets. This type of "static" case is often mitigated by external (to Solr) caching, either with a caching layer between Solr and a front-end application, or within a front-end application, or even with a caching layer between the end user and a front-end application.

      But in addition to the overhead of handling this caching elsewhere in the stack (or, for a new user, even being aware of this as a potential issue to mitigate), any external caching mitigation is really only appropriate for relatively static cases like the "landing page" example described above. A Solr-internal facet count cache (analogous to the filterCache) would provide the following additional benefits:

      1. ease of use/out-of-the-box configuration to address a common performance concern
      2. compact (specifically caching count arrays, without the extra baggage that accompanies a naive external caching approach)
      3. NRT-friendly (could be implemented to be segment-aware)
      4. modular, capable of reusing the same cached values in conjunction with variant requests over the same result domain (this would support common use cases like paging, but also potentially more interesting direct uses of facets). 
      5. could be used for distributed refinement (i.e., if facet counts over a given domain are cached, a refinement request could simply look up the ordinal value for each enumerated term and directly grab the count out of the count array that was cached during the first phase of facet calculation)
      6. composable (e.g., in aggregate functions that calculate values based on facet counts across different domains, like SKG/relatedness – see SOLR-13132)

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            magibney Michael Gibney

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 20m
                20m

                Slack

                  Issue deployment