Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25583

Handle the NoNode exception in remove log replication and avoid RS crash

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.7.0
    • 1.7.0
    • Replication
    • None
    • Reviewed

    Description

      Should not crash the region server it there is a NoNode exception while removing the log
      We should look into the excpetion and if it is NoNode we shouldn't crash. There might be a possiblity the node was deleted as part of peer tear down.

      @Override
      public void removeLog(String queueId, String filename) {
      try { 
        String znode = ZKUtil.joinZNode(this.myQueuesZnode, queueId); 
        znode = ZKUtil.joinZNode(znode, filename); ZKUtil.deleteNode(this.zookeeper, znode); }
      catch (KeeperException e) { 
        this.abortable.abort("Failed to remove wal from queue (queueId=" + queueId + ", filename=" + filename + ")", e); }
      }
      

      This was the exception observed on region servers:

      2021-02-16 20:11:58,567 FATAL [95922885,xyz_peer] regionserver.HRegionServer - ABORTING region server regionserver-111,60020,1613495922885: Failed to remove wal from queue (queueId=xyz_peer, filename=regionserver-111%2C60020%2C1613495922885.1613505863058)
      org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/regionserver-111,60020,1613495922885/xyz_peer/regionserver-111%2C60020%2C1613495922885.16135058630
      58
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:114)
              at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
              at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:890)
              at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:238)
              at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1341)
              at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1330)
              at org.apache.hadoop.hbase.replication.ReplicationQueuesZKImpl.removeLog(ReplicationQueuesZKImpl.java:142)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
      ger.java:232)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.cleanOldLogs(ReplicationSourceMana
      ger.java:222)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(Replica
      tionSourceManager.java:198)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.updateLogP
      osition(ReplicationSource.java:831)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.shipEdits(
      ReplicationSource.java:746)
              at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceShipperThread.run(Replic
      ationSource.java:650)
      

      Attachments

        Issue Links

          Activity

            People

              sandeep.pal Sandeep Pal
              sandeep.pal Sandeep Pal
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: