Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21535

Zombie Master detector is not working

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.0.0-alpha-1, 2.2.0, 2.1.1, 2.0.3
    • 3.0.0-alpha-1, 2.2.0, 2.1.3, 2.0.5, 2.3.0
    • master
    • None

    Description

      We have InitializationMonitor thread in HMaster which detects Zombie Hmaster based on hbase.master.initializationmonitor.timeout _and halts if _hbase.master.initializationmonitor.haltontimeout set true.

      After HBASE-19694, HMaster initialization order was correted. Hmaster is set active after Initializing ZK system trackers as follows,

       status.setStatus("Initializing ZK system trackers");
       initializeZKBasedSystemTrackers();
       status.setStatus("Loading last flushed sequence id of regions");
       try {
       this.serverManager.loadLastFlushedSequenceIds();
       } catch (IOException e) {
       LOG.debug("Failed to load last flushed sequence id of regions"
       + " from file system", e);
       }
       // Set ourselves as active Master now our claim has succeeded up in zk.
       this.activeMaster = true;
      

      But Zombie detector thread is started at the begining phase of finishActiveMasterInitialization(),

       private void finishActiveMasterInitialization(MonitoredTask status) throws IOException,
       InterruptedException, KeeperException, ReplicationException {
       Thread zombieDetector = new Thread(new InitializationMonitor(this),
       "ActiveMasterInitializationMonitor-" + System.currentTimeMillis());
       zombieDetector.setDaemon(true);
       zombieDetector.start();
      

      During zombieDetector execution "master.isActiveMaster()" will be false, so it won't wait and cant detect zombie master.

       @Override
       public void run() {
       try {
       while (!master.isStopped() && master.isActiveMaster()) {
       Thread.sleep(timeout);
       if (master.isInitialized()) {
       LOG.debug("Initialization completed within allotted tolerance. Monitor exiting.");
       } else {
       LOG.error("Master failed to complete initialization after " + timeout + "ms. Please"
       + " consider submitting a bug report including a thread dump of this process.");
       if (haltOnTimeout) {
       LOG.error("Zombie Master exiting. Thread dump to stdout");
       Threads.printThreadInfo(System.out, "Zombie HMaster");
       System.exit(-1);
       }
       }
       }
       } catch (InterruptedException ie) {
       LOG.trace("InitMonitor thread interrupted. Existing.");
       }
       }
       }
      

      Attachments

        1. HBASE-21535.v2.patch
          2 kB
          Pankaj Kumar
        2. HBASE-21535.patch
          2 kB
          Pankaj Kumar
        3. HBASE-21535.branch-2.patch
          2 kB
          Pankaj Kumar
        4. HBASE-21535.branch-2.patch
          2 kB
          Michael Stack

        Issue Links

          Activity

            People

              pankaj2461 Pankaj Kumar
              pankaj2461 Pankaj Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: