Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24897

RegionReplicaFlushHandler should handle NoServerForRegionException to avoid aborting RegionServer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.2.6
    • None
    • None

    Description

      Debug flaky test TestRegionReplicaReplicationEndpoint, I found the RS aborted because RegionReplicaFlushHandler flush failed. When create a new table with region replica, the assign order may be:

      1. assign 0002 replica region and trigger primary region flush.
      2. assign 0001 replica region and trigger primary region flush.
      3. assign primary region.

      But the primary region flush may failed because the primary region not opened now. So it may abort the RS......

       

      2020-08-18 16:56:30,041 INFO [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] handler.AssignRegionHandler(141): Opened testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0002.66e9757a05fbae7623cfea3369fc8354.
      2020-08-18 16:56:30,558 INFO [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] handler.AssignRegionHandler(141): Opened testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0001.22ff45423b0f1f0e93794f673449d140.
      2020-08-18 16:56:31,192 INFO [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] handler.AssignRegionHandler(141): Opened testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463.901f9cd06bbf27ef7c2d70b5af725cd2.
      
      
      2020-08-18 16:58:53,857 ERROR [RS_REGION_REPLICA_FLUSH_OPS-regionserver/hao-OptiPlex-7050:0-0] helpers.MarkerIgnoringBase(159): ***** ABORTING region server hao-optiplex-7050,36368,1597740961432: ServerAborting because an exception was thrown *****
      org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in hbase:meta for region testRegionReplicaReplicationWithReplicas_10,,1597741128945.0f541dc1a7ca64797c4cf054adb9edfb. containing row 
        at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:926)
        at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:784)
        at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:140)
        at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:147)
        at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:98)
        at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:84)
        at org.apache.hadoop.hbase.client.FlushRegionCallable.prepare(FlushRegionCallable.java:62)
        at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
        at org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.triggerFlushInPrimaryRegion(RegionReplicaFlushHandler.java:129)
        at org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.process(RegionReplicaFlushHandler.java:78)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
      

      I thought the fix should be assign primary region firstly when enable region replica featue. Will check the implmenation of region replica.

       

      Attachments

        Issue Links

          Activity

            People

              zghao Guanghao Zhang
              zghao Guanghao Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: