Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-10333

Speed up BinaryDocValues with a batch reading on LongValues

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • core/codecs
    • None
    • New

    Description

      Description
      In Lucene90DocValuesProducer, BinaryDocValue (as well as SortedNumericDocValues not in singleton case) has code patterns like this:

      long startOffset = addresses.get(doc);
      bytes.length = (int) (addresses.get(doc + 1L) - startOffset);
      

      This means we need to read 2 longs stored together. We could probably push down this info to LongValues and read 2 values together in one call. I think this can make sense because these codes could be rather hot.

      Benchmark

      In today's LuceneUtil benchmark, all results looks even. I suspect this is because we do not use BinaryDocValues any more in tasks. So i tried to rollback the baseline and candidate to a stale code version (before https://issues.apache.org/jira/browse/LUCENE-10062), we used to use BinaryDocvalues to store taxonomy ordinals in that version, and it can been seen a QPS increasing there. (This is tricky, i wonder if there is a more official way to benchmark BinaryDocValues by chaging some params or add some tasks? ) Anyway, I believe It is still worth optimizing BinarayDocValue though facets do not use it any more

      Benchmark result on stale code version where taxonomy ordinals are stored in BinaryDocvalues (to justify a speed up in BinaryDocValues)

                                  TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                 BrowseMonthSSDVFacets       17.25      (8.6%)       16.78     (17.8%)   -2.7% ( -26% -   25%) 0.536
                               LowTerm     1458.66      (3.6%)     1438.15      (4.4%)   -1.4% (  -9% -    6%) 0.268
                 HighTermDayOfYearSort      108.55     (10.0%)      108.04      (9.1%)   -0.5% ( -17% -   20%) 0.874
                            HighPhrase      168.65      (1.9%)      168.06      (2.3%)   -0.3% (  -4% -    3%) 0.602
                          OrNotHighLow     1201.79      (3.4%)     1197.93      (4.6%)   -0.3% (  -8% -    7%) 0.801
                          HighSpanNear       15.26      (1.6%)       15.21      (1.4%)   -0.3% (  -3% -    2%) 0.499
                               Respell       62.61      (1.8%)       62.45      (1.9%)   -0.3% (  -3% -    3%) 0.649
                             MedPhrase       57.57      (1.4%)       57.44      (1.8%)   -0.2% (  -3% -    2%) 0.648
                             OrHighMed      129.10      (3.0%)      128.83      (3.1%)   -0.2% (  -6% -    6%) 0.830
                           MedSpanNear       19.45      (2.3%)       19.41      (2.2%)   -0.2% (  -4% -    4%) 0.784
                            OrHighHigh       34.85      (1.5%)       34.79      (1.4%)   -0.2% (  -3% -    2%) 0.722
                  HighIntervalsOrdered       26.92      (4.7%)       26.89      (4.9%)   -0.1% (  -9% -    9%) 0.929
                                IntNRQ      343.52      (1.6%)      343.16      (2.0%)   -0.1% (  -3% -    3%) 0.855
                         OrHighNotHigh      595.61      (3.2%)      595.10      (4.3%)   -0.1% (  -7% -    7%) 0.944
                   MedIntervalsOrdered       17.66      (3.6%)       17.65      (3.8%)   -0.1% (  -7% -    7%) 0.961
                   LowIntervalsOrdered      109.23      (3.3%)      109.18      (3.5%)   -0.0% (  -6% -    7%) 0.969
                           AndHighHigh       81.09      (1.5%)       81.10      (2.0%)    0.0% (  -3% -    3%) 0.967
                           LowSpanNear      203.33      (2.1%)      203.41      (1.8%)    0.0% (  -3% -    3%) 0.948
                       MedSloppyPhrase       27.15      (1.5%)       27.17      (1.2%)    0.1% (  -2% -    2%) 0.907
                             LowPhrase       75.76      (1.8%)       75.81      (2.0%)    0.1% (  -3% -    3%) 0.904
               AndHighMedDayTaxoFacets       97.27      (1.9%)       97.35      (1.9%)    0.1% (  -3% -    4%) 0.888
                      HighSloppyPhrase       14.32      (2.7%)       14.34      (1.8%)    0.1% (  -4% -    4%) 0.870
                                Fuzzy2       76.00      (3.9%)       76.12      (3.4%)    0.2% (  -6% -    7%) 0.894
                              Wildcard      123.51      (1.8%)      123.71      (2.1%)    0.2% (  -3% -    4%) 0.796
                          OrHighNotLow      722.64      (4.4%)      724.15      (5.4%)    0.2% (  -9% -   10%) 0.894
                            AndHighLow      929.73      (4.0%)      931.75      (3.8%)    0.2% (  -7% -    8%) 0.859
                               Prefix3      240.13      (1.5%)      240.69      (1.9%)    0.2% (  -3% -    3%) 0.675
                            AndHighMed      210.17      (1.7%)      210.84      (1.6%)    0.3% (  -2% -    3%) 0.532
                       LowSloppyPhrase      142.83      (1.8%)      143.54      (2.0%)    0.5% (  -3% -    4%) 0.410
                          OrNotHighMed      709.24      (4.4%)      712.78      (4.3%)    0.5% (  -7% -    9%) 0.715
                                Fuzzy1       85.33      (5.7%)       85.77      (6.3%)    0.5% ( -10% -   13%) 0.786
                               MedTerm     1466.50      (3.5%)     1474.85      (3.9%)    0.6% (  -6% -    8%) 0.629
                            TermDTSort      105.51      (7.7%)      106.33      (7.3%)    0.8% ( -13% -   17%) 0.746
                              PKLookup      206.18      (2.9%)      208.68      (2.9%)    1.2% (  -4% -    7%) 0.179
                          OrHighNotMed      876.71      (3.0%)      887.84      (3.9%)    1.3% (  -5% -    8%) 0.251
                         OrNotHighHigh      774.25      (4.7%)      785.03      (6.0%)    1.4% (  -8% -   12%) 0.411
                     HighTermMonthSort       74.33      (9.4%)       75.47     (16.3%)    1.5% ( -22% -   30%) 0.716
                             OrHighLow      518.73      (5.2%)      528.27      (5.4%)    1.8% (  -8% -   13%) 0.272
                              HighTerm     1892.16      (3.4%)     1934.63      (5.5%)    2.2% (  -6% -   11%) 0.120
              AndHighHighDayTaxoFacets       16.46      (2.7%)       16.84      (2.3%)    2.3% (  -2% -    7%) 0.004
                  HighTermTitleBDVSort      141.39     (14.6%)      145.33     (15.1%)    2.8% ( -23% -   38%) 0.554
                  MedTermDayTaxoFacets       27.81      (2.1%)       29.54      (2.3%)    6.2% (   1% -   10%) 0.000
                OrHighMedDayTaxoFacets        3.05      (1.9%)        3.30      (2.2%)    8.3% (   4% -   12%) 0.000
             BrowseDayOfYearSSDVFacets       17.36     (13.0%)       18.97     (15.8%)    9.3% ( -17% -   43%) 0.042
             BrowseDayOfYearTaxoFacets        3.02      (3.6%)        3.79      (2.5%)   25.4% (  18% -   32%) 0.000
                  BrowseDateTaxoFacets        3.01      (3.6%)        3.79      (2.5%)   25.6% (  18% -   32%) 0.000
                 BrowseMonthTaxoFacets        3.14      (2.1%)        3.99      (2.5%)   27.0% (  21% -   32%) 0.000
      

      newest code version

                                  TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                            TermDTSort      129.74     (10.9%)      127.83     (11.3%)   -1.5% ( -21% -   23%) 0.675
                              HighTerm     1182.13      (5.1%)     1172.76      (6.5%)   -0.8% ( -11% -   11%) 0.668
                          HighSpanNear        7.99      (4.2%)        7.96      (4.2%)   -0.3% (  -8% -    8%) 0.816
                  HighIntervalsOrdered       17.86      (2.1%)       17.85      (2.3%)   -0.1% (  -4% -    4%) 0.927
                  BrowseDateTaxoFacets       19.61     (17.2%)       19.61     (17.4%)   -0.0% ( -29% -   41%) 0.995
                         OrNotHighHigh      619.85      (4.3%)      619.72      (8.6%)   -0.0% ( -12% -   13%) 0.992
                              PKLookup      202.14      (5.6%)      202.11      (4.4%)   -0.0% (  -9% -   10%) 0.994
                   LowIntervalsOrdered       25.53      (1.5%)       25.53      (1.6%)    0.0% (  -3% -    3%) 1.000
             BrowseDayOfYearSSDVFacets       14.27      (2.7%)       14.28      (2.7%)    0.0% (  -5% -    5%) 0.965
                   MedIntervalsOrdered       47.33      (1.9%)       47.34      (2.0%)    0.0% (  -3% -    3%) 0.947
           BrowseRandomLabelSSDVFacets       10.25      (2.4%)       10.26      (2.4%)    0.1% (  -4% -    4%) 0.935
                 BrowseMonthSSDVFacets       15.66      (3.0%)       15.67      (3.0%)    0.1% (  -5% -    6%) 0.945
                       MedSloppyPhrase       11.97      (1.7%)       11.98      (1.9%)    0.1% (  -3% -    3%) 0.840
                              Wildcard       25.71      (2.6%)       25.75      (2.4%)    0.1% (  -4% -    5%) 0.875
                             MedPhrase       33.62      (2.5%)       33.68      (2.6%)    0.2% (  -4% -    5%) 0.802
                 HighTermDayOfYearSort       80.58     (11.0%)       80.76     (10.6%)    0.2% ( -19% -   24%) 0.949
                  HighTermTitleBDVSort      130.43     (11.7%)      130.73     (10.7%)    0.2% ( -19% -   25%) 0.947
              AndHighHighDayTaxoFacets       32.25      (3.0%)       32.33      (2.9%)    0.2% (  -5% -    6%) 0.796
                       LowSloppyPhrase       39.50      (1.7%)       39.61      (1.4%)    0.3% (  -2% -    3%) 0.586
                               Prefix3      127.42      (3.8%)      127.77      (3.4%)    0.3% (  -6% -    7%) 0.812
                     HighTermMonthSort      117.65      (8.4%)      117.98      (8.1%)    0.3% ( -14% -   18%) 0.915
                      HighSloppyPhrase       14.47      (1.8%)       14.51      (2.2%)    0.3% (  -3% -    4%) 0.647
                           MedSpanNear       48.78      (2.2%)       48.93      (2.0%)    0.3% (  -3% -    4%) 0.640
                OrHighMedDayTaxoFacets       13.42      (3.7%)       13.48      (3.6%)    0.4% (  -6% -    7%) 0.730
               AndHighMedDayTaxoFacets       37.90      (3.0%)       38.05      (3.4%)    0.4% (  -5% -    7%) 0.694
                                Fuzzy1       83.31      (3.9%)       83.70      (4.9%)    0.5% (  -7% -    9%) 0.738
                               Respell       49.74      (1.3%)       50.00      (1.5%)    0.5% (  -2% -    3%) 0.254
                             OrHighLow      531.57      (8.0%)      534.83      (6.7%)    0.6% ( -13% -   16%) 0.792
                           AndHighHigh       71.99      (2.6%)       72.44      (3.4%)    0.6% (  -5% -    6%) 0.520
                           LowSpanNear      191.64      (3.5%)      192.85      (3.7%)    0.6% (  -6% -    8%) 0.580
                  MedTermDayTaxoFacets       55.51      (3.1%)       55.86      (3.9%)    0.6% (  -6% -    7%) 0.567
           BrowseRandomLabelTaxoFacets    11492.93      (5.0%)    11570.83      (4.8%)    0.7% (  -8% -   11%) 0.663
                                IntNRQ       93.40      (2.1%)       94.05      (2.4%)    0.7% (  -3% -    5%) 0.319
                            AndHighMed      175.02      (2.6%)      176.42      (3.9%)    0.8% (  -5% -    7%) 0.445
                                Fuzzy2       45.25      (7.2%)       45.64      (6.2%)    0.9% ( -11% -   15%) 0.682
                            AndHighLow      825.32      (6.8%)      833.43      (8.0%)    1.0% ( -12% -   16%) 0.677
                               MedTerm     1408.91      (6.2%)     1423.27     (10.2%)    1.0% ( -14% -   18%) 0.703
                             OrHighMed      136.68      (3.8%)      138.15      (3.6%)    1.1% (  -6% -    8%) 0.356
                            OrHighHigh       16.31      (3.4%)       16.49      (1.9%)    1.1% (  -4% -    6%) 0.205
             BrowseDayOfYearTaxoFacets    11349.30      (4.4%)    11494.17      (4.6%)    1.3% (  -7% -   10%) 0.366
                            HighPhrase       83.13      (2.9%)       84.24      (3.4%)    1.3% (  -4% -    7%) 0.184
                          OrHighNotMed      630.30      (5.6%)      639.65      (6.4%)    1.5% (  -9% -   14%) 0.436
                             LowPhrase      310.17      (4.2%)      315.08      (5.4%)    1.6% (  -7% -   11%) 0.297
                         OrHighNotHigh      723.22      (5.0%)      734.71      (8.4%)    1.6% ( -11% -   15%) 0.468
                 BrowseMonthTaxoFacets    11665.05      (7.6%)    11892.66      (5.1%)    2.0% (  -9% -   15%) 0.339
                          OrHighNotLow      851.60      (6.5%)      869.16      (7.6%)    2.1% ( -11% -   17%) 0.355
                          OrNotHighMed      699.29      (5.2%)      717.74      (7.7%)    2.6% (  -9% -   16%) 0.205
                          OrNotHighLow      954.65      (6.4%)      982.93      (9.6%)    3.0% ( -12% -   20%) 0.252
                               LowTerm     2158.23      (9.1%)     2227.33     (13.4%)    3.2% ( -17% -   28%) 0.377
      
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gf2121 Feng Guo
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 10m
                  3h 10m