ZOOKEEPER-3104 describes one critical data inconsistency risk.
The risk also exists in 3.4 branch.
In our 3.4.13 production cluster, the data inconsistency happens for many times.
After digging some transaction logs and snapshot, we believe that
ZOOKEEPER-3104 is the main risk to contributes to our data inconsistency.
The risk probability maybe higher than we can consider in real product environment. The serialization of big DataTree may leads to a big risk time window in the high write traffic situation. Any failure during the risk time window would cause the data inconsistency.
The data inconsistency is almost unacceptable in zookeeper semantic.
This issue is already fixed in 3.6. But I think it is very necessary to backport
ZOOKEEPER-3104 to branch-3.4, especially in the situation that the migration from 3.4 to 3.5 actually take more effort to evaluate the compatibility risk in real product environment.
I will have submit a github pull request to fix it. Can anyone help us to review it please ?