Description
A example:
We have a table "A" which is in RSGroup "group1". "bd806f94a53be74e65bd76e1e6e16e5a" is a region of A and is opened on RS "rs1".
Two steps will repeat this bug:
step1: Split region bd806f94a53be74e65bd76e1e6e16e5a
step2: Before the region is cleared by CatalogJanitor, client runs shell : move_server_rsgroup 'group2', ['rs1:60020'] or balance_rsgroup 'group1'
Finally, client will have exceptions below and rest regions moving will be interrupted.
ERROR: org.apache.hadoop.hbase.client.DoNotRetryRegionException: bd806f94a53be74e65bd76e1e6e16e5a is not OPEN at org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:189) at org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.<init>(MoveRegionProcedure.java:71) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:755) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.move(AssignmentManager.java:560) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.moveServers(RSGroupAdminServer.java:349) at org.apache.hadoop.hbase.rsgroup.FGRSGroupAdminServer.moveServers(FGRSGroupAdminServer.java:119) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.moveServers(RSGroupAdminEndpoint.java:209) at org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:13870) at org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:813) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) For usage try 'help "move_servers_rsgroup”' ERROR: org.apache.hadoop.hbase.client.DoNotRetryRegionException: bd806f94a53be74e65bd76e1e6e16e5a is not OPEN at org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:189) at org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.<init>(MoveRegionProcedure.java:71) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:755) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.moveAsync(AssignmentManager.java:565) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.balanceRSGroup(RSGroupAdminServer.java:516) at org.apache.hadoop.hbase.rsgroup.FGRSGroupAdminServer.balanceRSGroup(FGRSGroupAdminServer.java:164) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint$RSGroupAdminServiceImpl.balanceRSGroup(RSGroupAdminEndpoint.java:296) at org.apache.hadoop.hbase.protobuf.generated.RSGroupAdminProtos$RSGroupAdminService.callMethod(RSGroupAdminProtos.java:13890) at org.apache.hadoop.hbase.master.MasterRpcServices.execMasterService(MasterRpcServices.java:813) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) For usage try 'help "balance_rsgroup"'
Aflter splitting, this parent region will not be used anymore and will be cleared by CatalogJanitor in the future. So should we ignore moving it when doing move_server_rsgroup or balance_rsgroup?