Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8557

LeafReader.getFieldInfos should always return the same instance

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 7.5
    • None
    • None
    • None
    • New, Patch Available

    Description

      Most implementations of the LeafReader cache an instance of FieldInfos which is returned in the LeafReader.getFieldInfos() method.  There are a few places that currently do not and this can cause performance problems.

      The most notable example is the lack of caching in Solr's SlowCompositeReaderWrapper which caused unexpected performance slowdowns when trying to use Solr's JSON Facets compared to the legacy facets.

      This proposed change is mostly relevant to Solr but touches a few Lucene classes.  Specifically:

      1. Adds a check to TestUtil.checkReader to verify that LeafReader.getFieldInfos() returns the same instance:

       

      // FieldInfos should be cached at the reader and always return the same instance
       if (reader.getFieldInfos() != reader.getFieldInfos()) {
       throw new RuntimeException("getFieldInfos() returned different instances for class: "+reader.getClass());
       }
      

      I'm not entirely sure this is wanted or needed but adding it uncovered most of the other LeafReader implementations that were not caching FieldInfos.  I'm happy to remove this part of the patch though.

       

      2. Adds a FieldInfos.EMPTY that can be used in a handful of places

       

      public final static FieldInfos EMPTY = new FieldInfos(new FieldInfo[0]);
      

      There are several places in the Lucene/Solr tests that were creating empty instances of FieldInfos which were causing the check in #1 to fail.  This fixes those failures and cleans up the code a bit.

      3. Fixes a few LeafReader implementations that were not caching FieldInfos

      Specifically:

      • MemoryIndex.MemoryIndexReader - The constructor was already looping over the fields so it seemed natural to just create the FieldInfos at that time
      • SlowCompositeReaderWrapper - This was the one causing me trouble.  I've moved the caching of FieldInfos from SolrIndexSearcher to SlowCompositeReaderWrapper.
      • CollapsingQParserPlugin.ReaderWrapper - getFieldInfos() is immediately called twice after this is constructed
      • ExpandComponent.ReaderWrapper - getFieldInfos() is immediately called twice after this is constructed

       

      4. Minor Solr tweak to avoid calling SolrIndexSearcher.getSlowAtomicReader in FacetFieldProcessorByHashDV.  This change is now optional since SlowCompositeReaderWrapper caches FieldInfos.

       

      As suggested by dsmiley this takes the place of SOLR-12878 since it touches some Lucene code.

       

      Attachments

        1. LUCENE-8557.patch
          23 kB
          David Smiley

        Issue Links

          Activity

            People

              dsmiley David Smiley
              tpunder Tim Underwood
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h