Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-10688

Keep only traversed state, remove all other revisions

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • documentmk

    Description

      As a slightly different algorithm to OAK-10535 this ticket suggests to calculate the traversedState of a node, then keeps only those revisions needed for that traversedState and removes all others. The main difference is an inversion of logic, where instead of analysing for each revision whether it must be kept or not - this first derives the revision that must be "kept" from the traversedState - then deletes all others.

      This mechanism applies to all (normal and bundled) properties as well as some DocumentNodeStore internal ones, such as "_deleted".

      Below are a list of assumptions to back this:

      • DetailedGC runs only up to the older between the oldest checkpoint and maxRevisionAge (24h by default). Thus a document analysed by DetailedGC is guaranteed to have only 1 revision (per property) that must be kept - as it is guaranteed to not have modifications (revisions) younger than any checkpoint or maxRevisionAge (24h)
      • To find out which revision(s) must be kept, the node tree is traversed from root (based on current head revision) to the target document.
      • Given the first bullet (that we're only looking at nodes that have only 1 revision (each, per property) to keep, this traversed node state thus contains the values of those.
      • Hence, based on each of the property key of the traversed state, the corresponding "commit revision" in the document-local map must be calculated. That local map entry must be kept - all others can be deleted.
      • Note that this also cleans up overwritten branch commits of the same branch (as only the last, relevant one is kept)

      As a result of the above, certain other entries can be deleted, namely:

      • any "_commitRoot" entry no longer referenced by the local document
      • any "_bc" entry no longer referenced by the local document

      Independent of the traversedState and the outcome of the cleanup what can also be removed is:

      • any "_revisions" entry older than the current sweepRev

      However: "_revisions" entry that might not be referenced by the local document and are younger than the sweepRev must still be kept, as they might be referenced by child documents (through their "_commitRoot" pointing to the current document). Without checking for children and double-checking the actual use, there could as a result still be some garbage "_revisions" entries left.

      Attachments

        Issue Links

          Activity

            People

              stefanegli Stefan Egli
              stefanegli Stefan Egli
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: