Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28342

Decommissioned hosts should be rejected by the HMaster

    XMLWordPrintableJSON

Details

    • Hide
      <!-- markdown -->
      This change introduces the configuration `hbase.master.reject.decommissioned.hosts`. When this property is set to `true`, region servers added to the [decommissioning hosts list](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#decommissionRegionServers-java.util.List-boolean-) will be checked by hostname only (not taking into consideration RPC port or startcode). When a region server with a hostname that matches the list attempts to join the cluster, the Master will reject its application by responding with the new `DecommissionedHostRejectedException`.
      Show
      <!-- markdown --> This change introduces the configuration `hbase.master.reject.decommissioned.hosts`. When this property is set to `true`, region servers added to the [decommissioning hosts list]( https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Admin.html#decommissionRegionServers-java.util.List-boolean- ) will be checked by hostname only (not taking into consideration RPC port or startcode). When a region server with a hostname that matches the list attempts to join the cluster, the Master will reject its application by responding with the new `DecommissionedHostRejectedException`.

    Description

      We had an issue with a cluster, internally at HubSpot, where a decommissioned RegionServer was still being picked up by the HMaster. The host the RegionServer was living on was impaired, and we couldn't correctly kill the RegionServer, so the HMaster would periodically hear back from the host and remove it from its dead host's list.

      We would like to implement a fix so that this doesn't happen. We're thinking of adding a boolean flag to the Decommission RegionServer Admin API that signifies ignoring the startcode of the servername, when the boolean is True the host will be rejected every time it comes back even if it had a different startcode.

      Attachments

        Issue Links

          Activity

            People

              aalhour Ahmad Alhour
              aalhour Ahmad Alhour
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: