Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-10494

Cache backend.getRecord() calls to minimise CloudBlob.downloadAttributes() over the network

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • blob-plugins
    • None

    Description

      Problem: Metadata are requested more than once for each blob

      Setting a breakpoint in AbstractSharedCachingDataStore.getRecordIfStored() and logging the dataIdentifiers, we see that it calls backend.getRecord() 3 times for the same dataIdentifier when a replication package is installed by vault. The reason seems to be that during commits, every CommitHook runs its own compareAgainstBaseState and, because the implementation avoids fetching the blob if it only needs the metadata, the request to the existing blob cache is always a miss.

      Proposed solution: Cache backend.getRecord() calls

      Manual testing has shown that caching backend.getRecord() calls reduces the time spent in .getRecordIfStored() by between 12 and 35% when installing replication packages containing 500 paths.

      The PR is at https://github.com/apache/jackrabbit-oak/pull/1155

      Attachments

        1. Measurements.md
          81 kB
          Axel Hanikel

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ahanikel Axel Hanikel
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: