Description
After "seekToEnd" is called, "KafkaConsumer.position" may return a wrong offset set by another reset request.
Here is a reproducer: https://github.com/zsxwing/kafka/commit/4e1aa11bfa99a38ac1e2cb0872c055db56b33246
In this reproducer, "poll(0)" will send an "earliest" request in background. However, after "seekToEnd" is called, due to a race condition in "Fetcher.resetOffsetIfNeeded" (It's not atomic, "seekToEnd" could happen between the check https://github.com/zsxwing/kafka/commit/4e1aa11bfa99a38ac1e2cb0872c055db56b33246#diff-b45245913eaae46aa847d2615d62cde0R585 and the seek https://github.com/zsxwing/kafka/commit/4e1aa11bfa99a38ac1e2cb0872c055db56b33246#diff-b45245913eaae46aa847d2615d62cde0R605), "KafkaConsumer.position" may return an "earliest" offset.
Attachments
Issue Links
- causes
-
SPARK-26267 Kafka source may reprocess data
- Resolved
- relates to
-
SPARK-27281 Wrong latest offsets returned by DirectKafkaInputDStream#latestOffsets
- In Progress
- links to