Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16515

Remove synchronized access to cachedOrdMaps in SlowCompositeReaderWrapper

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 9.0, 8.11.2
    • 9.2
    • search
    • None

    Description

      The SlowCompositeReaderWrapper uses synchronized read and write access to its internal cachedOrdMaps . By using a ConcurrentHashMap instead of a LinkedHashMap as the underlying cachedOrdMaps implementation and the ConcurrentHashMap#computeIfAbsent method to compute cache values, we were able to reduce locking contention significantly.

      Background

      Under heavy load we discovered that application halts inside of Solr are becoming a serious problem in high traffic environments. Using Java Flight Recordings we discovered high accumulated applications halts on the cachedOrdMaps in SlowCompositeReaderWrapper . Without this fix we were able to utilize our machines only up to 25% cpu usage. With the fix applied, a utilization up to 80% is perfectly doable.

      Description

      Our Solr instances utilizes the collapse component heavily. The instances run with 32 cores and 32gb Java heap on a rather small index (4gb). The instances scale out at 50% cpu load. We take Java Flight Recorder snapshots of 60 seconds
      as soon the cpu usage exceeds 50%.

      During our 60s Java Flight Recorder snapshot, the ~2k Jetty threads accumulated more than 16h locking time inside the SlowCompositeReaderWrapper (see screenshot). With this fix applied, the locking access is reduced to cache write accesses only. We validated this using another JFR snapshot:

      Solution

      We propose the following improvement inside the SlowCompositeReaderWrapper removing blocking synchronized access to the internal cachedOrdMaps . The implementation keeps the semantics of the getSortedDocValues and getSortedSetDocValues methods but moves the expensive part of OrdinalMap#build into a producer. We use the producer to access the ConcurrentHashMap using the ConcurrentHashMap#computeIfAbsent method only.
      The current implementation uses the synchronized block not only to lock access to the cachedOrdMaps but also to protect the critical section between getting, building and putting the OrdinalMap into the cache. Inside the critical section the decision is formed, whether a cacheable value should be composed and added to the cache.
      To support non-blocking read access to the cache, we move the building part of the critical section into a producer Function . The check whether we have a cacheable value is made upfront. To properly make that decision we had to take logic from MultiDocValues#getSortedSetValues and MultiDocValues#getSortedValues (the SlowCompositeReaderWrapper already contained duplicated code from those methods).

      Summary

      This change removes most blocking access inside the SlowCompositeReaderWrapper and despite it's name it's now capable of a much higher request throughput.
      This change has been composed together by Dennis Berger, Torsten Bøgh Köster and Marco Petris.

      Attachments

        1. slow-composite-reader-wrapper-after.jpg
          334 kB
          Torsten Bøgh Köster
        2. slow-composite-reader-wrapper-before.jpg
          341 kB
          Torsten Bøgh Köster

        Issue Links

          Activity

            People

              Unassigned Unassigned
              tboeghk Torsten Bøgh Köster
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h