Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Recently saw this several times in pre commit result.
Checked the log output, it is stuck in testRegionServerSessionExpired.
When replaying the edit for meta region, in the end we need to flush the memstore, and the flush is stuck which causes the test to timeout.
This is the last log message for opening hbase:meta
2023-10-15T14:37:46,704 INFO [RS_OPEN_META-regionserver/2c0085825d5f:0-0 {event_type=M_RS_OPEN_META, pid=9}] regionserver.HRegion(2885): Flushing 1588230740 4/4 column families, dataSize=74 B heapSize=1.22 KB
And when the test timed out, we saw this
2023-10-15T14:47:57,360 WARN [RS_OPEN_META-regionserver/2c0085825d5f:0-0 {event_type=M_RS_OPEN_META, pid=9}] regionserver.HStore(846): Failed flushing store file for 1588230740/ns, retrying num=0 java.nio.channels.ClosedChannelException: null at org.apache.hadoop.hdfs.ExceptionLastSeen.throwException4Close(ExceptionLastSeen.java:73) ~[hadoop-hdfs-client-3.2.4.jar:?] at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:153) ~[hadoop-hdfs-client-3.2.4.jar:?] at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:105) ~[hadoop-common-3.2.4.jar:?] at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) ~[hadoop-common-3.2.4.jar:?] at java.io.DataOutputStream.write(DataOutputStream.java:107) ~[?:1.8.0_352] at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:1045) ~[classes/:?] at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:1032) ~[classes/:?] at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.writeInlineBlocks(HFileWriterImpl.java:539) ~[classes/:?] at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:615) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:70) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:74) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:828) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1969) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:3012) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2720) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:5458) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1032) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:966) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7774) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7729) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7704) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7663) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7619) ~[classes/:?] at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:138) ~[classes/:?] at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) ~[classes/:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
It is stuck on writing data to HDFS...
Not sure what is the root cause, need to dig more...