Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21625

a runnable procedure v2 does not run

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Cannot Reproduce
    • 3.0.0-alpha-1
    • None
    • amv2, proc-v2
    • None

    Description

      This is on master snapshot as of a few weeks ago.
      Haven't looked at the code much yet, but it seems rather fundamental. The procedure comes from meta replica assignment (HBASE-21624), in case it matters w.r.t. the engine initialization; however, the master is functional and other procedures run fine. I can also see lots of other open region procedures with a similar patterns that were initialized before this one and have run fine.
      Currently, there are no other runnable procedures on master - a lot of succeeded procedures since then, the parent blocked on this procedure, and one unrelated RIT procedure waiting with timeout and being updated periodically.

      The procedure itself is

      157156 	157155 	RUNNABLE 	hadoop 	org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure 	Wed Dec 19 17:20:27 PST 2018 	Wed Dec 19 17:20:28 PST 2018 		[ { region => { regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline => 'false', split => 'false', replicaId => '1' }, targetServer => { hostName => 'server1', port => '17020', startCode => '1545266805778' } }, {} ]
      

      This is in PST so it's been like that for ~19 hours.

      The only line involving this PID in the log is

      2018-12-19 17:20:27,974 INFO  [PEWorker-4] procedure2.ProcedureExecutor: Initialized subprocedures=[{pid=157156, ppid=157155, state=RUNNABLE, hasLock=false; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure}]
      

      There are no other useful logs for either this PID, parent PID, or region in question since. This PEWorker (4) is also alive and did some work since then, so it's not like the thread errored out somewhere.

      All the PEWorker-s are waiting for work:

      Thread 158 (PEWorker-16):
        State: TIMED_WAITING
        Blocked count: 1340
        Waited count: 5064
        Stack:
          sun.misc.Unsafe.park(Native Method)
          java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
          java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
          org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:171)
          org.apache.hadoop.hbase.procedure2.AbstractProcedureScheduler.poll(AbstractProcedureScheduler.java:153)
          org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1949)
      

      The main assignment procedure for this region is blocked on it:

      157155 		WAITING 	hadoop 	TransitRegionStateProcedure table=hbase:meta, region=534574363, ASSIGN 	Wed Dec 19 17:20:27 PST 2018 	Wed Dec 19 17:20:27 PST 2018 		[ { state => [ '1', '2', '3' ] }, { regionId => '1', tableName => { ... }, startKey => '', endKey => '', offline => 'false', split => 'false', replicaId => '1' }, { initialState => 'REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE', lastState => 'REGION_STATE_TRANSITION_CONFIRM_OPENED', assignCandidate => { hostName => 'server1', port => '17020', startCode => '1545266805778' }, forceNewPlan => 'false' } ]
      
      
      2018-12-19 17:20:27,673 INFO  [PEWorker-9] procedure.MasterProcedureScheduler: Took xlock for pid=157155, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=false; TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN
      2018-12-19 17:20:27,809 INFO  [PEWorker-9] assignment.TransitRegionStateProcedure: Starting pid=157155, state=RUNNABLE:REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE, hasLock=true; TransitRegionStateProcedure table=hbase:meta, region=..., ASSIGN; rit=OFFLINE, location=server1,17020,1545266805778; forceNewPlan=false, retain=false
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            sershe Sergey Shelukhin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: