Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24625

AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length.

    XMLWordPrintableJSON

Details

    • Reviewed
    • Hide
      We add a method getSyncedLength in WALProvider.WriterBase interface for WALFileLengthProvider used for replication, considering the case if we use AsyncFSWAL,we write to 3 DNs concurrently,according to the visibility guarantee of HDFS, the data will be available immediately
      when arriving at DN since all the DNs will be considered as the last one in pipeline.This means replication may read uncommitted data and replicate it to the remote cluster and cause data inconsistency.The method WriterBase#getLength may return length which just in hdfs client buffer and not successfully synced to HDFS, so we use this method WriterBase#getSyncedLength to return the length successfully synced to HDFS and replication thread could only read writing WAL file limited by this length.
      see also HBASE-14004 and this document for more details:
      https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#

      Before this patch, replication may read uncommitted data and replicate it to the slave cluster and cause data inconsistency between master and slave cluster, we could use FSHLog instead of AsyncFSWAL to reduce probability of inconsistency without this patch applied.
          
      Show
      We add a method getSyncedLength in WALProvider.WriterBase interface for WALFileLengthProvider used for replication, considering the case if we use AsyncFSWAL,we write to 3 DNs concurrently,according to the visibility guarantee of HDFS, the data will be available immediately when arriving at DN since all the DNs will be considered as the last one in pipeline.This means replication may read uncommitted data and replicate it to the remote cluster and cause data inconsistency.The method WriterBase#getLength may return length which just in hdfs client buffer and not successfully synced to HDFS, so we use this method WriterBase#getSyncedLength to return the length successfully synced to HDFS and replication thread could only read writing WAL file limited by this length. see also HBASE-14004 and this document for more details: https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit# Before this patch, replication may read uncommitted data and replicate it to the slave cluster and cause data inconsistency between master and slave cluster, we could use FSHLog instead of AsyncFSWAL to reduce probability of inconsistency without this patch applied.     

    Description

      By HBASE-14004, we introduce WALFileLengthProvider interface to keep the current writing wal file length by ourselves, WALEntryStream used by ReplicationSourceWALReader could only read WAL file byte size <= WALFileLengthProvider.getLogFileSizeIfBeingWritten if the WAL file is current been writing on the same RegionServer .

      AsyncFSWAL implements WALFileLengthProvider by AbstractFSWAL.getLogFileSizeIfBeingWritten, just as folllows :

         public OptionalLong getLogFileSizeIfBeingWritten(Path path) {
          rollWriterLock.lock();
          try {
            Path currentPath = getOldPath();
            if (path.equals(currentPath)) {
              W writer = this.writer;
              return writer != null ? OptionalLong.of(writer.getLength()) : OptionalLong.empty();
            } else {
              return OptionalLong.empty();
            }
          } finally {
            rollWriterLock.unlock();
          }
        }
      

      For AsyncFSWAL, above AsyncFSWAL.writer is AsyncProtobufLogWriter ,and AsyncProtobufLogWriter.getLength is as follows:

          public long getLength() {
              return length.get();
          }
      

      But for AsyncProtobufLogWriter, any append method may increase the above AsyncProtobufLogWriter.length, especially for following AsyncFSWAL.append
      method just appending the WALEntry to FanOutOneBlockAsyncDFSOutput.buf:

           public void append(Entry entry) {
                int buffered = output.buffered();
                try {
                    entry.getKey().
                    getBuilder(compressor).setFollowingKvCount(entry.getEdit().size()).build()
                    .writeDelimitedTo(asyncOutputWrapper);
                } catch (IOException e) {
                   throw new AssertionError("should not happen", e);
                }
         
              try {
                 for (Cell cell : entry.getEdit().getCells()) {
                   cellEncoder.write(cell);
                 }
                } catch (IOException e) {
                 throw new AssertionError("should not happen", e);
               }
               length.addAndGet(output.buffered() - buffered);
           }
      

      That is to say, AsyncFSWAL.getLogFileSizeIfBeingWritten could not reflect the file length which successfully synced to underlying HDFS, which is not as expected.

      Attachments

        Issue Links

          Activity

            People

              comnetwork chenglei
              comnetwork chenglei
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: