[HBASE-14004] [Replication] Inconsistency between Memstore and WAL may result in data in remote cluster that is not in the origin - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.0.0-alpha-4, 2.0.0
Component/s: regionserver, Replication
Labels:
- replication
- wal

Hadoop Flags:

Reviewed
Release Note:

Hide
Now when replicating a wal file which is still opened for write, we will get its committed length from the WAL instance in the same RS to prevent replicating uncommit WALEdit.

This is very important if you use AsyncFSWAL, as we use fan-out in AsyncFSWAL. The data written to DN will be visible immediately as all DNs think it is the end of a pipeline, although the client has not received an ack, and also NN may truncate the file if the client crashes at the same time.

Show
Now when replicating a wal file which is still opened for write, we will get its committed length from the WAL instance in the same RS to prevent replicating uncommit WALEdit. This is very important if you use AsyncFSWAL, as we use fan-out in AsyncFSWAL. The data written to DN will be visible immediately as all DNs think it is the end of a pipeline, although the client has not received an ack, and also NN may truncate the file if the client crashes at the same time.

Description

Looks like the current write path can cause inconsistency between memstore/hfile and WAL which cause the slave cluster has more data than the master cluster.

The simplified write path looks like:
1. insert record into Memstore
2. write record to WAL
3. sync WAL
4. rollback Memstore if 3 fails

It's possible that the HDFS sync RPC call fails, but the data is already (may partially) transported to the DNs which finally get persisted. As a result, the handler will rollback the Memstore and the later flushed HFile will also skip this record.

==================================

This is a long lived issue. The above problem is solved by write path reorder, as now we will sync wal first before modifying memstore. But the problem may still exists as replication thread may read the new data before we return from hflush. See this document for more details:

https://docs.google.com/document/d/11AyWtGhItQs6vsLRIx32PwTxmBY3libXwGXI25obVEY/edit#

So we need to keep a sync length in WAL and tell replication wal reader this is limit when you read this wal file.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HBASE-14004-v3.patch
15/Sep/17 08:06
79 kB
Duo Zhang
HBASE-14004-v2.patch
15/Sep/17 06:53
86 kB
Duo Zhang
HBASE-14004-v2.patch
14/Sep/17 09:35
86 kB
Duo Zhang
HBASE-14004-v1.patch
12/Sep/17 02:48
85 kB
Duo Zhang
HBASE-14004.patch
11/Sep/17 13:17
84 kB
Duo Zhang

Issue Links

breaks

HBASE-21503 Replication normal source can get stuck due potential race conditions between source wal reader and wal provider initialization threads.

Resolved

HBASE-18845 TestReplicationSmallTests fails after HBASE-14004

Resolved

is related to

HBASE-24625 AsyncFSWAL.getLogFileSizeIfBeingWritten does not return the expected synced file length.

Resolved

HBASE-28184 Tailing the WAL is very slow if there are multiple peers.

Resolved

relates to

HBASE-5954 Allow proper fsync support for HBase

Closed

HBASE-14790 Implement a new DFSOutputStream for logging WAL only

Closed

links to

fsync behind and fsync on close (and HDFS-744)

Review Board

The new design for HBase writing logic to prevent data loss and inconsistency(HBASE-14004)

(1 relates to, 3 links to)

Activity

People

Assignee:: Duo Zhang

Reporter:: He Liangliang

Votes:: 0 Vote for this issue

Watchers:: 28 Start watching this issue

Dates

Created:: 01/Jul/15 12:32

Updated:: 01/Nov/23 22:46

Resolved:: 15/Sep/17 12:34