[HBASE-22538] Prevent graceful_stop.sh from shutting down RS too early before finishing unloading regions - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.4.9
Fix Version/s: 1.5.0, 1.3.6, 1.4.11
Component/s: shell
Labels:
None

Hadoop Flags:

Reviewed

Description

We can stop or restart region servers gracefully using graceful_stop.sh command
This command should guarantee that all regions are moved out before shutting down a region server.

However, sometimes i saw many requests failed while restarting a region server with this command in our production clusters(v1.2.5)
affected clients got many RegionServerStoppedExceptions and exhausted retry count.

I found it took 0.03 sec to move a region, it’s too fast. and, moving(unloading) regions in the region server wasn’t finished, even didn’t closed yet when region server got shutdown signal.
Because a region server serving regions (didn't be closed) were stopped, clients got many exception (RegionServerStoppedException)

But, region_mover should wait until a region is served by other region server(meta changed)
https://github.com/apache/hbase/blob/branch-1.2/bin/region_mover.rb#L153

I figured out why this early shutdown happened.
a) our clusters use upper case hostname
b) region server makes ServerName with lowercase hostname, and it will be sent to the master
https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L542
c) when updating meta, server name will keep its own case
https://github.com/apache/hbase/blob/branch-1.2/hbase-client/src/main/java/org/apache/hadoop/hbase/MetaTableAccessor.java#L1527
d) region_mover.rb just compare b) and c), so it is always false
https://github.com/apache/hbase/blob/branch-1.2/bin/region_mover.rb#L91
https://github.com/apache/hbase/blob/branch-1.2/bin/region_mover.rb#L52

I think region_mover should compare server name between master and meta with the same case(lower)

With patch, I confirmed region_mover waited until finishing moving all regions, then triggered shutting down region sever. (also observed only RegionMovedException before shutdown log, and no exception after starting shutdown)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-22538.branch-1.4.001.patch
04/Jun/19 10:21
0.7 kB
Jeongdae Kim
HBASE-22538.branch-1.4.002.patch
11/Jun/19 00:58
0.7 kB
Jeongdae Kim

Activity

People

Assignee:: Jeongdae Kim

Reporter:: Jeongdae Kim

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 04/Jun/19 10:10

Updated:: 21/Jun/19 02:20

Resolved:: 20/Jun/19 22:25