Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Although fast path tail use quorum read to pull edit log, it seem like can read uncommitted data in some corner case.
Here is an example. Suppose we have three JN, their init state is:
epoch 1 JN1 [1-3](in-progress) JN2 [1-3](in-progress) JN3 [1-4](in-progress) Note that, in epoch 1 txid 1-3 was committed, and txid 4 not.
When a failover occur, if a new writer cannot contact to JN3 for network partition, and finish the recovery stage, and write a new txid 4 in epoch 2, which value not equal to JN3's.
epcho 2 JN1 [1-3](finalized) [4-4](inprogress) JN2 [1-3](finalized) [4-4](inprogress) JN3 [1-4](inprogress) Note that, in JN3 txid4's value not equal to other JN.
Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it got majority response. But it got logs of same length but different content.And no more information to choose which log is right. If we choose JN3, we got meta data corruption.
There is a test example patch [^example.patch] for running and debug.
For fix it i think we should add finalized state to GetJournaledEditsResponseProto, so we can discard the fault log.
Attachments
Attachments
Issue Links
- is related to
-
HDFS-13150 [Edit Tail Fast Path] Allow SbNN to tail in-progress edits from JN via RPC
- Resolved