Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22538

Prevent graceful_stop.sh from shutting down RS too early before finishing unloading regions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 1.4.9
    • 1.5.0, 1.3.6, 1.4.11
    • shell
    • None
    • Reviewed

    Description

      We can stop or restart region servers gracefully using graceful_stop.sh command
      This command should guarantee that all regions are moved out before shutting down a region server.

      However, sometimes i saw many requests failed while restarting a region server with this command in our production clusters(v1.2.5)
      affected clients got many RegionServerStoppedExceptions and exhausted retry count.

      I found it took 0.03 sec to move a region, it’s too fast. and, moving(unloading) regions in the region server wasn’t finished, even didn’t closed yet when region server got shutdown signal.
      Because a region server serving regions (didn't be closed) were stopped, clients got many exception (RegionServerStoppedException)

      But, region_mover should wait until a region is served by other region server(meta changed)
      https://github.com/apache/hbase/blob/branch-1.2/bin/region_mover.rb#L153

      I figured out why this early shutdown happened.
      a) our clusters use upper case hostname
      b) region server makes ServerName with lowercase hostname, and it will be sent to the master
      https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L542
      c) when updating meta, server name will keep its own case
      https://github.com/apache/hbase/blob/branch-1.2/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java#L1527
      d) region_mover.rb just compare b) and c), so it is always false
      https://github.com/apache/hbase/blob/branch-1.2/bin/region_mover.rb#L91
      https://github.com/apache/hbase/blob/branch-1.2/bin/region_mover.rb#L52

      I think region_mover should compare server name between master and meta with the same case(lower)

      With patch, I confirmed region_mover waited until finishing moving all regions, then triggered shutting down region sever. (also observed only RegionMovedException before shutdown log, and no exception after starting shutdown)

      Attachments

        1. HBASE-22538.branch-1.4.001.patch
          0.7 kB
          Jeongdae Kim
        2. HBASE-22538.branch-1.4.002.patch
          0.7 kB
          Jeongdae Kim

        Activity

          People

            Jeongdae Kim Jeongdae Kim
            Jeongdae Kim Jeongdae Kim
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: