Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26233

The region replication framework should not be built upon the general replication framework

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-3
    • read replicas
    • None
    • Reviewed
    • Hide
      In this issue we re-implement the 'Async WAL Replication' mechanism fo region replication, to decouple with the general replication framwork.

      Now, we extend the MVCC write entry to be able to carry an action once it is completed, so we can attach the WAL edits to this action and send them out directly after the write entry is completed, without touch the actual WAL system. So the special 'region_replica_replication' peer is also useless.

      We also introduce a new replicateToReplica method at region server side for receiving the WAL edits, which has a much simpler logic than the old replay method.

      After this issue, we do not need to treat META table specially in region replication, as said above, we do not depend on the general replication framework any more. But to avoid effecting the usage too much, the 'config hbase.region.replica.replication.catalog.enabled' is kept, you still need to enable this flag if you want to enable 'Async WAL Replication' support for META table.

      Rolling upgrading with 'Async WAL Replication enabled is supported. During rolling upgrading, the 'region_replica_replication' will be removed automatically after master is upgraded. And at region server side, if the new replicateToReplica method is not available when it tries to replicate to secondary replicas, it will fallback to use replay method automatically.

      Please related sections in our ref guide are:

      http://hbase.apache.org/book.html#async.wal.replication
      http://hbase.apache.org/book.html#async.wal.replication.meta
      http://hbase.apache.org/book.html#_async_wal_replication_for_meta_table_as_of_hbase_3_0_0
      Show
      In this issue we re-implement the 'Async WAL Replication' mechanism fo region replication, to decouple with the general replication framwork. Now, we extend the MVCC write entry to be able to carry an action once it is completed, so we can attach the WAL edits to this action and send them out directly after the write entry is completed, without touch the actual WAL system. So the special 'region_replica_replication' peer is also useless. We also introduce a new replicateToReplica method at region server side for receiving the WAL edits, which has a much simpler logic than the old replay method. After this issue, we do not need to treat META table specially in region replication, as said above, we do not depend on the general replication framework any more. But to avoid effecting the usage too much, the 'config hbase.region.replica.replication.catalog.enabled' is kept, you still need to enable this flag if you want to enable 'Async WAL Replication' support for META table. Rolling upgrading with 'Async WAL Replication enabled is supported. During rolling upgrading, the 'region_replica_replication' will be removed automatically after master is upgraded. And at region server side, if the new replicateToReplica method is not available when it tries to replicate to secondary replicas, it will fallback to use replay method automatically. Please related sections in our ref guide are: http://hbase.apache.org/book.html#async.wal.replication http://hbase.apache.org/book.html#async.wal.replication.meta http://hbase.apache.org/book.html#_async_wal_replication_for_meta_table_as_of_hbase_3_0_0

    Description

      At least, at the source path, where we track the edits, we should not make region replication rely on general replication framework.

      The difficulty here for switching to a table based storage is that, the WAL system and replication system highly depend on each other. There will be cyclic dependency if we want to store replication peer and queue data in a hbase table.
      And after HBASE-18070, even meta wal provider will be integrated together with replication system, which makes things more difficult.

      But in general, for region replication, it is not a big deal to lose some edits, a flush can fix everything, which means we do not so heavy tracking system in the general replication system.

      We should find a more light-weighted way to do region replication.

      Attachments

        Issue Links

          Activity

            People

              zhangduo Duo Zhang
              zhangduo Duo Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: