Uploaded image for project: 'Apache Ozone'
  1. Apache Ozone
  2. HDDS-6510

Incremental Checkpointing Support

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Currently, each time to install a snapshot for OM and SCM is to get a checkpoint of RDB and send it to the follower. As the data stored in RDB increases, the very long transmission time of the whole checkpoint could be a large cost, which could cause the follower to install the snapshot repeatedly if it finds out the leader has already truncated the new raft logs and needs to install a new snapshot.

      Given an example in the test(OM), the raft log index is 570767469, it takes around 13 minutes for the follower to install the snapshot. As ozone is designed to overcome the shortage of in-memory metadata, it should have the ability to preserve much more data than a hundred million level.  Once the OM has reached that level, each time to install snapshot would be a big problem. There will be only two raft peers working (if we set up 3-node HA) and that condition is fragile.

      Another statics: For 16 hundred million keys, the size of om.db directory is 45GB. Around 2.8 hundred million keys/GB. This is tested through createKey api.

      To solve the problem, we should have Incremental Checkpointing. This could provide another slight increment instead of the whole RDB checkpoint and thus reduce the time of transmission. I recommend referring to the implementation in FLINK, but we need to store the diff of checkpoints locally instead of another storage system.

       

      Attachments

        1. 2022-03-15 7.58.44.png
          331 kB
          Xu Shao Hong

        Issue Links

          Activity

            People

              Nibiruxu Xu Shao Hong
              Nibiruxu Xu Shao Hong
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: