Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27850

TimeoutIOException: Failed to get sync result after 300000 ms for txid=16920651960, WAL system stuck?

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.6
    • None
    • regionserver
    • None
    • hbase 2.2.6

      hadoop 3.3.1

    Description

      A node under a RsGroup (only one table), at a certain moment, the write call queue is blocked, and the blocking time starts, and the reading and writing qps of this table are all reduced to 0, and the client cannot read and write the table, RS call At the point in time when queue blocking starts, the following errors are continuously reported in the log:
       
      2023-05-08 12:42:27,310 ERROR [MemStoreFlusher.2] regionserver.MemStoreFlusher: Cache flush failed for region user_feature_v2,eacf_1658057555,1660314723816.2376cc2326b5372131cc530b115d959a.
      org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=16920651960, WAL system stuck?
              at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:155)
              at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:743)
              at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:625)
              at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:602)
              at org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2754)
              at org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2691)
              at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2549)
              at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2523)
              at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2409)
              at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:611)
              at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:580)
              at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
              at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:360)
              at java.lang.Thread.run(Thread.java:748)
      The data in the node memstore cannot be flushed to the WAL file, other indicators of the node are normal, and HDFS is not under pressure. After restarting the blocked node, the table returned to normal. 
       

      Attachments

        1. 49151.log1
          1.19 MB
          longping_jie

        Activity

          People

            Unassigned Unassigned
            leo_jie longping_jie
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: