Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22684 The log rolling request maybe canceled immediately in LogRoller due to a race
  3. HBASE-26435

[branch-1] The log rolling request maybe canceled immediately in LogRoller due to a race

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.7.2
    • wal
    • None
    • Reviewed

    Description

      Saw this issue in our internal 1.6 branch.
      All the writes to this RS were getting failing since the underlying hdfs file was corrupt. This healed after 1 hour (equivalent to hbase.regionserver.logroll.period conf key).
      The WAL  was rolled but the new WAL file was not writable and it logged the following error also.

      2021-11-03 19:20:19,503 WARN  [.168:60020.logRoller] hdfs.DFSClient - Error while syncing
      java.io.IOException: Could not get block locations. Source file "/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389" - Aborting...
              at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466)
              at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251)
              at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670)
      
      2021-11-03 19:20:19,507 WARN  [.168:60020.logRoller] wal.FSHLog - pre-sync failed but an optimization so keep going
      java.io.IOException: Could not get block locations. Source file "/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389" - Aborting...
              at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466)
              at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251)
              at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670)
      

      Since the new WAL file was not writable, appends to that file started failing immediately it was rolled.

      2021-11-03 19:20:19,677 INFO  [.168:60020.logRoller] wal.FSHLog - Rolled WAL /hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635965392022 with entries=253234, filesize=425.67 MB; new WAL /hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389
      
      
      2021-11-03 19:20:19,690 WARN  [020.append-pool17-t1] wal.FSHLog - Append sequenceId=1962661783, requesting roll of WAL
      java.io.IOException: Could not get block locations. Source file "/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635967219389" - Aborting...
              at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1466)
              at org.apache.hadoop.hdfs.DataStreamer.processDatanodeError(DataStreamer.java:1251)
              at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:670)
      
      
      2021-11-03 19:20:19,690 INFO  [.168:60020.logRoller] wal.FSHLog - Archiving hdfs://prod-EMPTY-hbase2a/hbase/WALs/<rs-name>,60020,1635567166484/<rs-name>%2C60020%2C1635567166484.1635960792837 to hdfs://prod-EMPTY-hbase2a/hbase/oldWALs/hbase2a-dnds1-232-ukb.ops.sfdc.net%2C60020%2C1635567166484.1635960792837
      

      We always reset the rollLog flag within LogRoller thread after the rollWal call is complete.
      Within FSHLog#rollWriter method, it does many things, like replacing the writer and archiving old logs. If append thread fails to write to new file while logRoller thread is cleaning old logs, we will miss the rollLog flag since LogRoller will reset the flag to false while the previous rollWriter call is going on.
      Relevant code: https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/LogRoller.java#L183-L203

      We need to reset rollLog flag before we start rolling the wal.
      This is fixed in branch-2 and master via HBASE-22684 but we didn't fix it in branch-1
      Also branch-2 has multi wal implementation so it can apply cleanly in branch-1.

      Attachments

        Issue Links

          Activity

            People

              shahrs87 Rushabh Shah
              shahrs87 Rushabh Shah
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: