Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20908

Infinite loop on regionserver if region replica are reduced

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Steps to reproduce

      hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
      
      
      hbase(main):003:0> put 'myTable','r1','cf:col1','1'
      0 row(s) in 0.1230 seconds
      
      hbase(main):004:0> disable 'myTable'
      alter '0 row(s) in 2.3040 seconds
      
      hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
      Updating all regions with the new schema...
      1/1 regions updated.
      Done.
      0 row(s) in 11.9550 seconds
      
      hbase(main):006:0> enable 'myTable'
      0 row(s) in 1.2620 seconds
      
      hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
      0 row(s) in 0.0060 seconds
      
      

      This is the replica region request which will not be present now in Meta but was there in cache. Server will say that he is not serving this region.

      com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException): org.apache.hadoop.hbase.NotServingRegionException: Region d997d9b47a106216b9b117617ec09015 is not online on 10.22.9.76,16020,1531341039091
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
      	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
      	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
      

      Eventually, when we will update our cache after looking into meta , we will get into an infinite loop as this event will not be replicated because the location of the replica will not appear again.

      java.net.SocketTimeoutException: callTimeout=1200000, callDuration=2181316: Can't get the location null
      	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
      	at org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
      	at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
      	at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
      	at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
      	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
      	... 5 more
      Caused by: java.io.IOException: HRegionInfo was null in myTable, row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0, myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0, myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0, myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
      	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
      	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1179)
      	at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:170)
      	... 8 more
      
      

      Attachments

        1. HBASE-20908.patch
          22 kB
          Ankit Singhal
        2. HBASE-20908_v1.patch
          14 kB
          Ankit Singhal
        3. HBASE-20908_v3.patch
          13 kB
          Ankit Singhal
        4. 20908_v3.patch
          13 kB
          Ted Yu
        5. HBASE-20908_v3-branch-1.patch
          14 kB
          Ankit Singhal
        6. 20908_v3-branch-1.patch
          14 kB
          Ted Yu
        7. HBASE-20908_v3-branch-1.patch
          14 kB
          Ted Yu
        8. HBASE-20908_v3-branch-1.patch
          14 kB
          Ankit Singhal

        Activity

          People

            ankit@apache.org Ankit Singhal
            ankit@apache.org Ankit Singhal
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: