Description
In a stress cluster, I saw one tablet get "stuck" in the following state:
- the transaction_tracker on all three replicas is "full" (no more can be submitted)
- leader elections proceed just fine, but no leader is able to advance the commit index
The issue seems to be that a replica will respond with 'CANNOT_PREPARE' when its transaction tracker is full. The leader then ignores this response, and doesn't advance the majority-replicated watermark. The transaction tracker stays full forever because the in-flight transactions can't get committed.
Notes to follow.
Attachments
Issue Links
- is related to
-
KUDU-1787 Tablet fails to start pending replicates due to transaction tracker limit
- Open