Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10624

Binary Search for Sparse IndexedDISI advanceWithinBlock & advanceExactWithinBlock

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 9.0, 9.1, 9.2
    • None
    • core/codecs
    • None
    • New, Patch Available

    Description

      Problem Statement

      We noticed DocValue read performance regression with the iterative API when upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The degradation is similar to what's described in https://issues.apache.org/jira/browse/SOLR-9599 

      By analyzing profiling data, we found method "advanceWithinBlock" and "advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to their O(N) doc lookup algorithm.

      Changes

      Used binary search algorithm to replace current O(N) lookup algorithm in Sparse IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because docs are in ascending order.

      Test

      ./gradlew tidy
      ./gradlew check 

      Benchmark

      06/30/2022 Update: The below benchmark data points are invalid. I started a new AWS EC2 instance and run the test. The performance of candidate and baseline are very close.

       

      Ran sparseTaxis test cases from luceneutil. Attached the reports of baseline and candidates in attachments section.

      1. Most cases have 5-10% search latency reduction.

      2. Some highlights (>20%):

      • T0 green_pickup_latitude:[40.75 TO 40.9] yellow_pickup_latitude:[40.75 TO 40.9] sort=null
        • Baseline:  10973978+ hits hits in 726.81967 msec
        • Candidate: 10973978+ hits hits in 484.544594 msec
      • T0 cab_color:y cab_color:g sort=null
        • Baseline: 2300174+ hits hits in 95.698324 msec
        • Candidate: 2300174+ hits hits in 78.336193 msec
      • T1 cab_color:y cab_color:g sort=null
        • Baseline: 2300174+ hits hits in 391.565239 msec
        • Candidate: 300174+ hits hits in 227.592885 msec{}
      • ...

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              wwm1994 Weiming Wu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h