Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-16807

RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      It's little weird, but it happened in the product environment that few RegionServer missed master znode create notification on master failover. In that case ZooKeeperNodeTracker will not refresh the cached data and MasterAddressTracker will always return old active HM detail to Region server on ServiceException.

      Though We create region server stub on failure but without refreshing the MasterAddressTracker data.

      In HRegionServer.createRegionServerStatusStub()

        boolean refresh = false; // for the first time, use cached data
          RegionServerStatusService.BlockingInterface intf = null;
          boolean interrupted = false;
          try {
            while (keepLooping()) {
              sn = this.masterAddressTracker.getMasterAddress(refresh);
              if (sn == null) {
                if (!keepLooping()) {
                  // give up with no connection.
                  LOG.debug("No master found and cluster is stopped; bailing out");
                  return null;
                }
                if (System.currentTimeMillis() > (previousLogTime + 1000)) {
                  LOG.debug("No master found; retry");
                  previousLogTime = System.currentTimeMillis();
                }
                refresh = true; // let's try pull it from ZK directly
                if (sleep(200)) {
                  interrupted = true;
                }
                continue;
              }
      

      Here we refresh node only when 'sn' is NULL otherwise it will use same cached data.

      So in above case RegionServer will never report active HMaster successfully until HMaster failover or RegionServer restart.

      Attachments

        1. HBASE-16807.patch
          4 kB
          Pankaj Kumar
        2. HBASE-16807-0.98.patch
          4 kB
          Pankaj Kumar
        3. HBASE-16807-branch-1.patch
          4 kB
          Pankaj Kumar
        4. HBASE-16807-branch-1.3.patch
          4 kB
          Pankaj Kumar
        5. HBASE-16807-branch-1.1.patch
          4 kB
          Pankaj Kumar
        6. HBASE-16807-branch-1.2.patch
          4 kB
          Pankaj Kumar

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            pankaj2461 Pankaj Kumar
            pankaj2461 Pankaj Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment