Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26584 Further improvements on StoreFileTracker
  3. HBASE-26644

Spurious compaction failures with file tracker

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • None
    • None
    • Compaction
    • None

    Description

      Noticed when running a basic hbase pe randomWrite, we'll see compactions failing at various points.

      One example:

      2022-01-03 17:41:18,319 ERROR [regionserver/localhost:16020-shortCompactions-0] regionserver.CompactSplit(670): Compaction failed region=TestTable,00000000000000000004054490,1641249249856.2dc7251c6eceb660b9c7bb0b587db913., storeName=2dc7251c6eceb660b9c7bb0b587db913/info0,       priority=6, startTime=1641249666161
      java.io.IOException: Root-level entries already added in single-level mode
        at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeSingleLevelIndex(HFileBlockIndex.java:1136)
        at org.apache.hadoop.hbase.io.hfile.CompoundBloomFilterWriter$MetaWriter.write(CompoundBloomFilterWriter.java:279)
        at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl$1.writeToBlock(HFileWriterImpl.java:713)
        at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeBlock(HFileBlock.java:1205)
        at org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:660)
        at org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:377)
        at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.commitWriter(DefaultCompactor.java:70)
        at org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:386)
        at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:62)
        at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125)
        at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1141)
        at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2388)
        at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:654)
        at org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:697)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)  

      This isn't a super-critical issue because compactions will be retried automatically and they appear to eventually succeed. However, when the max storefiles limit is reaching, this does cause ingest to hang (as I was doing with my modest configuration).

      We had seen a similar kind of problem in our testing when backporting to HBase 2.4 (not upstream as the decision was to not do this) which we eventually tracked down to a bad merge-conflict resolution to the new HFile Cleaner. However, initial investigations don't have the same exact problem.

      It seems that we have some kind of generic race condition. Would be good to add more logging to catch this in the future (since we have two separate instances of this category of bug already).

      Attachments

        Activity

          People

            elserj Josh Elser
            elserj Josh Elser
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: