Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20828 Finish-up AMv2 Design/List of Tenets/Specification of operation
  3. HBASE-21051

Possible NPE if ModifyTable and region split happen at the same time

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.1.0, 2.0.1
    • None
    • amv2
    • None

    Description

      Similar with HBASE-20921, ModifyTable procedure and reopenProcedure won't held the lock, so another procedures like split/merge can execute at the same time.

      1. a split happend during ModifyTable, as you can see from the log, the split was nealy complete.

      2018-08-05 01:28:31,339 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1659): Finished subprocedure(s) of pid=772, state=RUNNABLE:SPLIT_TABLE_REGION_POST_OPERATION, hasLock=true; SplitTableRegionProce
      dure table=IntegrationTestBigLinkedList, parent=357a7a6a62c76bc2d7ab30a6cc812637, daughterA=b13e5d155b65a5f752f3adda78fcfb6a, daughterB=5be3aadcee68d91c3d1e464865550246; resume parent processing.
      2018-08-05 01:28:31,345 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1296): Finished pid=795, ppid=772, state=SUCCESS, hasLock=false; AssignProcedure table=IntegrationTestBigLinkedList, region=b13e5
      d155b65a5f752f3adda78fcfb6a, target=e010125048016.bja,60020,1533402809226 in 5.0280sec
      

      2. reopenProcedure began to reopen region by moving it

      2018-08-05 01:28:31,389 INFO  [PEWorker-11] procedure.MasterProcedureScheduler(631): pid=781, ppid=774, state=RUNNABLE:MOVE_REGION_UNASSIGN, hasLock=false; MoveRegionProcedure hri=357a7a6a62c76bc2d7ab3
      0a6cc812637, source=e010125048016.bja,60020,1533402809226, destination=e010125048016.bja,60020,1533402809226 checking lock on 357a7a6a62c76bc2d7ab30a6cc812637
      2018-08-05 01:28:31,390 INFO  [PEWorker-3] procedure2.ProcedureExecutor(1296): Finished pid=772, state=SUCCESS, hasLock=false; SplitTableRegionProcedure table=IntegrationTestBigLinkedList, parent=357a7
      a6a62c76bc2d7ab30a6cc812637, daughterA=b13e5d155b65a5f752f3adda78fcfb6a, daughterB=5be3aadcee68d91c3d1e464865550246 in 21.9050sec
      2018-08-05 01:28:31,518 INFO  [PEWorker-11] procedure2.ProcedureExecutor(1533): Initialized subprocedures=[{pid=797, ppid=781, state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedur
      e table=IntegrationTestBigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, server=e010125048016.bja,60020,1533402809226}]
      2018-08-05 01:28:31,530 INFO  [PEWorker-15] procedure.MasterProcedureScheduler(631): pid=797, ppid=781, state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedure table=IntegrationTest
      BigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, server=e010125048016.bja,60020,1533402809226 checking lock on 357a7a6a62c76bc2d7ab30a6cc812637
      

      3. MoveRegionProcdure fails since the region did not exis any more (due to split)

      2018-08-05 01:28:31,543 ERROR [PEWorker-15] procedure2.ProcedureExecutor(1517): CODE-BUG: Uncaught runtime exception: pid=797, ppid=781, state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; Unassig
      nProcedure table=IntegrationTestBigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, server=e010125048016.bja,60020,1533402809226
      java.lang.NullPointerException
              at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
              at org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1097)
              at org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1125)
              at org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1455)
              at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:204)
              at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:349)
              at org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:101)
              at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1498)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1278)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
              at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1785)
      

      We need to think about the case, and find a ultimately solution for it, otherwise, issues like this one and HBASE-20921 will keep comming.

      Attachments

        Issue Links

          Activity

            People

              allan163 Allan Yang
              allan163 Allan Yang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: