Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27961

[HBCK2] Running assigns/unassigns command with large number of files/regions throws CallTimeoutException

    XMLWordPrintableJSON

Details

    • Reviewed
    • Add support for batching in the following commands: assigns, unassigns and bypass

    Description

      While trying to run assigns command with a huge list of region, it fails with CTE. Even on trying to run it by breaking input into multiple files, it still fails and have to blindly submit same command again and again until no error.

      Exception seen as described above is as follows: 

      Exception in thread "main" java.io.IOException: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
      	at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:146)
      	at org.apache.hbase.HBCK2.assigns(HBCK2.java:454)
      	at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:1070)
      	at org.apache.hbase.HBCK2.run(HBCK2.java:1028)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
      	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
      	at org.apache.hbase.HBCK2.main(HBCK2.java:1367)
      Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:340)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:92)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:594)
      	at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.assigns(MasterProtos.java)
      	at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:141)
      	... 6 more
      Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
      	at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:222)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:424)
      	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:419)
      	at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:107)
      	at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:134)
      	at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
      	at org.apache.hbase.thirdparty.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
      	at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
      	at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
      	at org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
      	at java.lang.Thread.run(Thread.java:750)
      Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
      	at org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:135)
      	... 6 more
      
      

      The same issue should be valid for most of the command like unassigns, bypass etc.

      Proposed fixed

      • This can be fixed by introducing batching of the list of region passed via commandline or specified via -i arg

      Attachments

        Issue Links

          Activity

            People

              nihaljain.cs Nihal Jain
              nihaljain.cs Nihal Jain
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: