[HBASE-8537] Dead region server pulled in from ZK - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.98.0, 0.95.1
Component/s: master
Labels:
None

Hadoop Flags:

Reviewed

Description

When a cluster restarts quickly after it's crashed, although a new region server is reported in, the master still pulls in the dead region server from the zk.

2013-05-12 18:32:52,996 INFO  [IPC Server handler 6 on 36000] org.apache.hadoop.hbase.master.ServerManager: Registering server=a1217.halxg.cloudera.com,36020,1368408767773
....
2013-05-12 18:32:54,653 INFO  [master-a1220.halxg.cloudera.com,36000,1368408767520] org.apache.hadoop.hbase.master.HMaster: Registering server found up in zk but who has not yet reported in: a1217.halxg.cloudera.com,36020,1368378273768
2013-05-12 18:32:54,653 INFO  [master-a1220.halxg.cloudera.com,36000,1368408767520] org.apache.hadoop.hbase.master.ServerManager: Registering server=a1217.halxg.cloudera.com,36020,1368378273768

We should not pull in the second region server instance from zk. It is actually dead. We can figure this out by the hostname, and the port. We can assume no two region server instances can be alive on the same host, the same port. To be more cautious, we can check the timestamp as well. The live one should be that with the late timestamp, not pulled in from zk.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

trunk-8537.patch
13/May/13 18:40
1 kB
Jimmy Xiang
trunk-8537_v3.patch
14/May/13 21:01
12 kB
Jimmy Xiang
trunk-8537_v2.patch
13/May/13 23:01
1 kB
Jimmy Xiang

Activity

People

Assignee:: Jimmy Xiang

Reporter:: Jimmy Xiang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/May/13 17:21

Updated:: 23/Sep/13 19:08

Resolved:: 14/May/13 22:58