Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-26026

HBase Write may be stuck forever when using CompactingMemStore

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Sometimes I observed that HBase Write might be stuck in my hbase cluster which enabling CompactingMemStore. I have simulated the problem by unit test in my PR.
      The problem is caused by CompactingMemStore.checkAndAddToActiveSize :

      425   private boolean checkAndAddToActiveSize(MutableSegment currActive, Cell cellToAdd,
      426      MemStoreSizing memstoreSizing) {
      427    if (shouldFlushInMemory(currActive, cellToAdd, memstoreSizing)) {
      428      if (currActive.setInMemoryFlushed()) {
      429        flushInMemory(currActive);
      430        if (setInMemoryCompactionFlag()) {
      431         // The thread is dispatched to do in-memory compaction in the background
                    ......
       }
      

      In line 427, shouldFlushInMemory checking if currActive.getDataSize adding the size of cellToAdd exceeds CompactingMemStore.inmemoryFlushSize,if true, then currActive should be flushed, currActive.setInMemoryFlushed() is invoked in line 428 :

      public boolean setInMemoryFlushed() {
          return flushed.compareAndSet(false, true);
        }
      

      After sucessfully set currActive.flushed to true, in above line 429 flushInMemory(currActive) invokes CompactingMemStore.pushActiveToPipeline :

       protected void pushActiveToPipeline(MutableSegment currActive) {
          if (!currActive.isEmpty()) {
            pipeline.pushHead(currActive);
            resetActive();
          }
        }
      

      In above CompactingMemStore.pushActiveToPipeline method , if the currActive.cellSet is empty, then nothing is done. Due to concurrent writes and because we first add cell size to currActive.getDataSize and then actually add cell to currActive.cellSet, it is possible that currActive.getDataSize could not accommodate cellToAdd but currActive.cellSet is still empty if pending writes which not yet add cells to currActive.cellSet.
      So if the currActive.cellSet is empty now, then no ActiveSegment is created, and new writes still continue target to currActive, but currActive.flushed is true, currActive could not enter flushInMemory(currActive) again,and new ActiveSegment could not be created forever ! In the end all writes would be stuck.

      In my opinion , once currActive.flushed is set true, it could not continue use as ActiveSegment , and because of concurrent pending writes, only after currActive.updatesLock.writeLock() is acquired(i.e. currActive.waitForUpdates is called) in CompactingMemStore.inMemoryCompaction ,we can safely say currActive is empty or not.

      My fix is remove the if (!currActive.isEmpty()) check here and left the check to background InMemoryCompactionRunnable after currActive.waitForUpdates is called. An alternative fix is we use synchronization mechanism in checkAndAddToActiveSize method to prevent all writes , wait for all pending write completed(i.e. currActive.waitForUpdates is called) and if currActive is still empty ,then we set currActive.flushed back to false,but I am not inclined to use so heavy synchronization in write path, and I think we would better maintain lockless implementation for CompactingMemStore.add method just as now and currActive.waitForUpdates would better be left in background InMemoryCompactionRunnable.

      Attachments

        Issue Links

          Activity

            People

              comnetwork chenglei
              comnetwork chenglei
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: