Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27773

STUCK Region-In-Transition state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.11
    • None
    • regionserver
    • None
    • HBase: 2.4.11
      Hadoop: 3.2.4
      ZooKeeper: 3.7.1

    Description

      One problem we encounter with some regularity is the `STUCK Region-In-Transition state=OPENING`.

      We have a three server cluster that runs a full HBASE stack: 3 zookeeper nodes, an HBASE master active and standby, 3 region servers, 3 HDFS data nodes.

      We've managed to reproduce the stuck region in transition state, by rebooting randomly one of the 3 nodes. This is not necessarily the only way it may end up in this state, rather a deterministic way we managed to reproduce it to a certain extent. Also (a) writing data to hbase while the node reboot happens increases the chances of the stuck state being reached as well as (b) if the rebooted node is also the active hbasemaster.

      Sample logs:

       

      [7745.457s][info][gc] GC(12) Pause Young (Normal) (G1 Evacuation Pause) 523M->44M(818M) 12.736ms
      [10505.454s][info][gc] GC(13) Pause Young (Normal) (G1 Evacuation Pause) 523M->44M(818M) 11.066ms
      2023-04-03 11:26:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition state=OPENING, location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, region=b732898573f935b72fb1876c6ff944b3
      2023-04-03 11:27:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition state=OPENING, location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, region=78be037bae2fc201707fa511e90dfbbf
      2023-04-03 11:27:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition state=OPENING, location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, region=b732898573f935b72fb1876c6ff944b3
      2023-04-03 11:28:53,145 INFO  [master/cvp504:16000.Chore.1] master.HMaster: Not running balancer (force=false, metaRIT=false) because 2 region(s) in transition: [state=OPENING, location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, region=78be037bae2fc201707fa511e90dfbbf, state=OPENING, location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, region=b732898573f935b72fb1876c6ff944b3]
      2023-04-03 11:28:53,168 WARN  [master/cvp504:16000.Chore.1] janitor.CatalogJanitor: unknown_server=cvp503.sjc.aristanetworks.com,16201,1680499899167/aeris_v2,\x09,1680499940070.78be037bae2fc201707fa511e90dfbbf., unknown_server=cvp503.sjc.aristanetworks.com,16201,1680499899167/aeris_v2,\x12,1680499940070.b732898573f935b72fb1876c6ff944b3.
      2023-04-03 11:28:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition state=OPENING, location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, region=78be037bae2fc201707fa511e90dfbbf
      2023-04-03 11:28:53,208 WARN  [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition state=OPENING, location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, region=b732898573f935b72fb1876c6ff944b3
      2023-04-03 11:29:53,209 WARN  [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition state=OPENING, location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, region=78be037bae2fc201707fa511e90dfbbf
      2023-04-03 11:29:53,209 WARN  [ProcExecTimeout] assignment.AssignmentManager: STUCK Region-In-Transition state=OPENING, location=cvp504.sjc.aristanetworks.com,16201,1680509017771, table=aeris_v2, region=b732898573f935b72fb1876c6ff944b3

       

      The stuck state also gets fixed if we kill the pod with the regionserver which has the region with stuck in transition.

      Attachments

        1. config.txt
          6 kB
          Grigore Lupescu
        2. Locks.png
          111 kB
          Aaron Beitch
        3. Procedures.png
          114 kB
          Aaron Beitch

        Activity

          People

            Unassigned Unassigned
            grigore Grigore Lupescu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: