Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28154

TestZooKeeper could hang forever

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • test
    • None

    Description

      Recently saw this several times in pre commit result.

      Checked the log output, it is stuck in testRegionServerSessionExpired.

      When replaying the edit for meta region, in the end we need to flush the memstore, and the flush is stuck which causes the test to timeout.

      This is the last log message for opening hbase:meta

      2023-10-15T14:37:46,704 INFO  [RS_OPEN_META-regionserver/2c0085825d5f:0-0 {event_type=M_RS_OPEN_META, pid=9}] regionserver.HRegion(2885): Flushing 1588230740 4/4 column families, dataSize=74 B heapSize=1.22 KB
      

      And when the test timed out, we saw this

      2023-10-15T14:47:57,360 WARN  [RS_OPEN_META-regionserver/2c0085825d5f:0-0 {event_type=M_RS_OPEN_META, pid=9}] regionserver.HStore(846): Failed flushing store file for 1588230740/ns, retrying num=0
      java.nio.channels.ClosedChannelException: null
      	at org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73) ~[hadoop-hdfs-client-3.2.4.jar:?]
      	at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:153) ~[hadoop-hdfs-client-3.2.4.jar:?]
      	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105) ~[hadoop-common-3.2.4.jar:?]
      	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) ~[hadoop-common-3.2.4.jar:?]
      	at java.io.DataOutputStream.write(DataOutputStream.java:107) ~[?:1.8.0_352]
      	at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:1045) ~[classes/:?]
      	at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1032) ~[classes/:?]
      	at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.writeInlineBlocks(HFileWriterImpl.java:539) ~[classes/:?]
      	at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:615) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:70) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:828) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1969) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:3012) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2720) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5458) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1032) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:966) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7774) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7729) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7704) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7663) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7619) ~[classes/:?]
      	at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:138) ~[classes/:?]
      	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) ~[classes/:?]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352]
      	at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
      

      It is stuck on writing data to HDFS...

      Not sure what is the root cause, need to dig more...

      Attachments

        Activity

          People

            Unassigned Unassigned
            zhangduo Duo Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: