Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
1.7.0
-
None
-
Reviewed
Description
Should not crash the region server it there is a NoNode exception while removing the log
We should look into the excpetion and if it is NoNode we shouldn't crash. There might be a possiblity the node was deleted as part of peer tear down.
@Override public void removeLog(String queueId, String filename) { try { String znode = ZKUtil.joinZNode(this.myQueuesZnode, queueId); znode = ZKUtil.joinZNode(znode, filename); ZKUtil.deleteNode(this.zookeeper, znode); } catch (KeeperException e) { this.abortable.abort("Failed to remove wal from queue (queueId=" + queueId + ", filename=" + filename + ")", e); } }
This was the exception observed on region servers:
2021-02-16 20:11:58,567 FATAL [95922885,xyz_peer] regionserver.HRegionServer - ABORTING region server regionserver-111,60020,1613495922885: Failed to remove wal from queue (queueId=xyz_peer, filename=regionserver-111%2C60020%2C1613495922885.1613505863058)
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/regionserver-111,60020,1613495922885/xyz_peer/regionserver-111%2C60020%2C1613495922885.16135058630
58
at org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:890)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:238)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1341)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1330)
at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeLog(ReplicationQueuesZKImpl.java:142)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
ger.java:232)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
ger.java:222)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(Replica
tionSourceManager.java:198)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.updateLogP
osition(ReplicationSource.java:831)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.shipEdits(
ReplicationSource.java:746)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.run(Replic
ationSource.java:650)
Attachments
Issue Links
- breaks
-
HBASE-25741 Deadlock during peer cleanup with NoNodeException
- Resolved
- is a parent of
-
HBASE-25613 [Branch-2 and Master]Handle the NoNode exception in remove log replication in a better way then just log WARN
- Open
- links to