Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6517 Snapshot support for Ozone
  3. HDDS-7935

[Snapshot] LRU Cache entries may get evicted/closed during long running processes

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • None

    Description

      The way the snapshot LRU cache is implemented, when the oldest snapshot is evicted, the corresponding rocksdb instance is closed: https://github.com/apache/ozone/blob/3f7ded2a34c0c35b89901e222ceaee0d1fdf08b6/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java#L124

      That is probably fine for shortlived tasks like users reading snapshots, but is probably not safe for long lived tasks like snap diff and maybe snapshot delete.

      The problem is that the cache is currently only refreshed when the snapshot is initially retrieved from the cache; subsequent reads from the snapshot itself don't refresh the cache.  Thus it is possible for rocksdb instances to be evicted and closed in the middle of snap diff processing.

      One alternative I can think of is to add some kind of reference counting scheme so that rocksdb instances aren't closed automatically on eviction.

      Another possibility is to have an entirely separate pool of snapshot entries, outside of the cache, that are explicitly opened and closed by long running tasks like snapdiff.

      Attachments

        Issue Links

          Activity

            People

              smeng Siyao Meng
              georgeJahad George Jahad
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: