Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-19397

Design procedures for ReplicationManager to notify peer change event from master

    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      Introduce 5 procedures to do peer modifications:
      AddPeerProcedure
      RemovePeerProcedure
      UpdatePeerConfigProcedure
      EnablePeerProcedure
      DisablePeerProcedure

      The procedures are all executed with the following stage:
      1. Call pre CP hook, if an exception is thrown then give up
      2. Check whether the operation is valid, if not then give up
      3. Update peer storage. Notice that if we have entered this stage, then we can not rollback any more.
      4. Schedule sub procedures to refresh the peer config on every RS.
      5. Do post cleanup if any.
      6. Call post CP hook. The exception thrown will be ignored since we have already done the work.

      The procedure will hold an exclusive lock on the peer id, so now there is no concurrent modifications on a single peer.

      And now it is guaranteed that once the procedure is done, the peer modification has already taken effect on all RSes.

      Abstracte a storage layer for replication peer/queue manangement, and refactored the upper layer to remove zk related naming/code/comment.

      Add pre/postExecuteProcedures CP hooks to RegionServerObserver, and add permission check for executeProcedures method which requires the caller to be system user or super user.

      On rolling upgrade: just do not do any replication peer modifications during the rolling upgrading. There is no pb/layout changes on the peer/queue storage on zk.
      Show
      Introduce 5 procedures to do peer modifications: AddPeerProcedure RemovePeerProcedure UpdatePeerConfigProcedure EnablePeerProcedure DisablePeerProcedure The procedures are all executed with the following stage: 1. Call pre CP hook, if an exception is thrown then give up 2. Check whether the operation is valid, if not then give up 3. Update peer storage. Notice that if we have entered this stage, then we can not rollback any more. 4. Schedule sub procedures to refresh the peer config on every RS. 5. Do post cleanup if any. 6. Call post CP hook. The exception thrown will be ignored since we have already done the work. The procedure will hold an exclusive lock on the peer id, so now there is no concurrent modifications on a single peer. And now it is guaranteed that once the procedure is done, the peer modification has already taken effect on all RSes. Abstracte a storage layer for replication peer/queue manangement, and refactored the upper layer to remove zk related naming/code/comment. Add pre/postExecuteProcedures CP hooks to RegionServerObserver, and add permission check for executeProcedures method which requires the caller to be system user or super user. On rolling upgrade: just do not do any replication peer modifications during the rolling upgrading. There is no pb/layout changes on the peer/queue storage on zk.

    Description

      After we store peer states / peer queues information into hbase table, RS can not track peer config change by adding watcher znode.

      So we need design procedures for ReplicationManager to notify peer change event. the replication rpc interfaces which may be implemented by procedures are following:

      1. addReplicationPeer
      2. removeReplicationPeer
      3. enableReplicationPeer
      4. disableReplicationPeer
      5. updateReplicationPeerConfig
      

      BTW, our RS states will still be store in zookeeper, so when RS crash, the tracker which will trigger to transfer queues of crashed RS will still be a Zookeeper Tracker. we need NOT implement that by procedures.

      As we will release 2.0 in next weeks, and the HBASE-15867 can not be resolved before the release, so I'd prefer to create a new feature branch for HBASE-15867.

      Attachments

        1. HBASE-19397-master-v2.patch
          732 kB
          Duo Zhang
        2. HBASE-19397-branch-2.patch
          733 kB
          Duo Zhang
        3. HBASE-19397-master-v1.patch
          733 kB
          Duo Zhang
        4. HBASE-19397-master-v1.patch
          733 kB
          Duo Zhang
        5. HBASE-19397-master.patch
          732 kB
          Duo Zhang

        Issue Links

          1.
          Implement a general framework to execute remote procedure on RS Sub-task Closed Duo Zhang
          2.
          Add UTs for the new lock type PEER Sub-task Closed Guanghao Zhang
          3.
          Master side changes for moving peer modification from zk watcher to procedure Sub-task Closed Duo Zhang
          4.
          RS side changes for moving peer modification from zk watcher to procedure Sub-task Closed Zheng Hu
          5.
          Client side changes for moving peer modification from zk watcher to procedure Sub-task Closed Guanghao Zhang
          6.
          Abstract a replication storage interface to extract the zk specific code Sub-task Closed Duo Zhang
          7.
          Add UTs for testing concurrent modifications on replication peer Sub-task Closed Guanghao Zhang
          8.
          Procedure id is missing in the response of peer related operations Sub-task Closed Duo Zhang
          9.
          Rewrite ReplicationPeer with the new replication storage interface Sub-task Closed Guanghao Zhang
          10.
          Add peer lock test for shell command list_locks Sub-task Closed Guanghao Zhang
          11.
          Use slf4j instead of commons-logging in new, just-added Peer Procedure classes Sub-task Closed Duo Zhang
          12.
          Add UTs to test retry on update zk failure Sub-task Closed Duo Zhang
          13.
          Remove ReplicationQueuesClient, use ReplicationQueueStorage directly Sub-task Closed Duo Zhang
          14.
          Remove ReplicationQueues, use ReplicationQueueStorage directly Sub-task Closed Duo Zhang
          15.
          Reimplement ReplicationPeers with the new replication storage interface Sub-task Closed Zheng Hu
          16.
          Create replication endpoint asynchronously when adding a replication source Sub-task Closed Duo Zhang
          17.
          Add peer cluster key check when add new replication peer Sub-task Closed Guanghao Zhang
          18.
          Clean up the replication queues in the postPeerModification stage when removing a peer Sub-task Closed Duo Zhang
          19.
          Add permission check for executeProcedures in AccessController Sub-task Closed Duo Zhang
          20.
          Introduce a thread at RS side to call reportProcedureDone Sub-task Closed Duo Zhang
          21.
          All rs should already start work with the new peer change when replication peer procedure is finished Sub-task Closed Guanghao Zhang
          22.
          Fix locking for peer modification procedure Sub-task Closed Duo Zhang
          23.
          Replace ReplicationStateZKBase with ZKReplicationStorageBase Sub-task Closed Zheng Hu
          24.
          Use KeyLocker instead of ReentrantLock in PeerProcedureHandlerImpl Sub-task Closed Duo Zhang
          25.
          Move the logic in ReplicationZKNodeCleaner to ReplicationChecker and remove ReplicationZKNodeCleanerChore Sub-task Closed Duo Zhang
          26.
          Remove TestReplicationAdminUsingProcedure Sub-task Closed Duo Zhang
          27.
          Race in start and terminate of a replication source after we async start replicatione endpoint Sub-task Closed Duo Zhang
          28.
          TestReplicationAdmin.testConcurrentPeerOperations hangs Sub-task Closed Guanghao Zhang
          29.
          Fix checkstyle issues Sub-task Closed Duo Zhang

          Activity

            People

              zhangduo Duo Zhang
              openinx Zheng Hu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: