[HBASE-28184] Tailing the WAL is very slow if there are multiple peers. - ASF JIRA

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.6.0, 2.4.18, 3.0.0-beta-1, 2.5.7
Component/s: Replication
Labels:
- pull-request-available

Description

Noticed in one of our production clusters which has 4 peers.

Due to sudden ingestion of data, the size of log queue increased to a peak of 506. We have configured log roll size to 256 MB. Most of the edits in the WAL were from a table for which replication is disabled.

So all ReplicationSourceWALReader thread had to do was to replay the WAL and NOT replicate them. Still it took 12 hours to drain the queue.

Took few jstacks and found that ReplicationSourceWALReader was waiting to acquire rollWriterLock here

"regionserver/<rs>,1" #1036 daemon prio=5 os_prio=0 tid=0x00007f44b374e800 nid=0xbd7f waiting on condition [0x00007f37b4d19000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00007f3897a3e150> (a java.util.concurrent.locks.ReentrantLock$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:837)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:872)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1202)
        at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:228)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
        at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.getLogFileSizeIfBeingWritten(AbstractFSWAL.java:1102)
        at org.apache.hadoop.hbase.wal.WALProvider.lambda$null$0(WALProvider.java:128)
        at org.apache.hadoop.hbase.wal.WALProvider$$Lambda$177/1119730685.apply(Unknown Source)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.ArrayList$ArrayListSpliterator.tryAdvance(ArrayList.java:1361)
        at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126)
        at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:499)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:486)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.findAny(ReferencePipeline.java:536)
        at org.apache.hadoop.hbase.wal.WALProvider.lambda$getWALFileLengthProvider$2(WALProvider.java:129)
        at org.apache.hadoop.hbase.wal.WALProvider$$Lambda$140/1246380717.getLogFileSizeIfBeingWritten(Unknown Source)
        at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.readNextEntryAndRecordReaderPosition(WALEntryStream.java:260)
        at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.tryAdvanceEntry(WALEntryStream.java:172)
        at org.apache.hadoop.hbase.replication.regionserver.WALEntryStream.hasNext(WALEntryStream.java:101)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.readWALEntries(ReplicationSourceWALReader.java:222)
        at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:157)

All the peers will contend for this lock during every batch read.
Look at the code snippet below. We are guarding this section with rollWriterLock if we are replicating the active WAL file. But in our case we are NOT replicating active WAL file but still we acquire this lock only to return OptionalLong.empty();

  /**
   * if the given {@code path} is being written currently, then return its length.
   * <p>
   * This is used by replication to prevent replicating unacked log entries. See
   * https://issues.apache.org/jira/browse/HBASE-14004 for more details.
   */
  @Override
  public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
    rollWriterLock.lock();
    try {
       ...
       ...
    } finally {
      rollWriterLock.unlock();
    }

We can check the size of log queue and if it is greater than 1 then we can return early without acquiring the lock.