Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21844

Master could get stuck in initializing state while waiting for meta

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 3.0.0-alpha-1
    • None
    • master, meta
    • None

    Description

      If the active master crashes after meta server dies, there is a slight chance of master getting into a state where the ZK says meta is OPEN, but the server is dead and there is no active SCP to recover it (perhaps the SCP has aborted and the procWALs were corrupted). In this case the waitForMetaOnline never returns.

       

      We've seen this happening a few times when there had been a temporary HDFS outage. Following log lines shows this state.

       

      2019-01-17 18:55:48,497 WARN  [master/************:16000:becomeActiveMaster] master.HMaster: hbase:meta,,1.1588230740 is NOT online; state=

      {1588230740 *state=*OPEN**, ts=1547780128227, server=*************,16020,1547776821322}

      ; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

       

      I'm still investigating why and how to prevent getting into this bad state, but nevertheless the master should be able to recover during a restart by initiating a new SCP to fix the meta.

       

       

      Attachments

        Issue Links

          Activity

            People

              bahramch Bahram Chehrazy
              bahramch Bahram Chehrazy
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated: