Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31040

Offsets are only logged for partitions which had data this causes next batch to read the partitions that were not included from the beginning when using kafka

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.4.0, 2.4.4, 2.4.5
    • None
    • Structured Streaming

    Description

      Each batch should either log all offsets for each partition or should scan back across offset logs.

      https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala

      offset log 23615

       

      {"myTopic.myTopic.orders":{"2":27531503,"5":27562423,"4":27528794,"1":27514991,"3":27528899,"0":27504949}}%
      

       

       

      offset log 23616

       

      {"myTopic.myTopic.orders":{"1":27515130,"0":27505140}}%
      

       

       

      /0/03/04 13:49:05 INFO MicroBatchExecution: Resuming at batch 26317 with committed offsets {KafkaV2[Subscribe[myTopic.myTopic.orders]]: {"myTopic.myTopic.orders":{"1":27515130,"0":27505140}}} and available offsets {KafkaV2[Subscribe[myTopic.myTopic.orders]]: {"myTopic.myTopic.orders":{"2":27531625,"5":27562568,"4":27528990,"1":27515131,"3":27529075,"0":27505141}}}commit log: {"myTopic.myTopic.orders":{"1":27515130,"0":27505140}}%0/03/04 13:50:24 INFO KafkaMicroBatchReader: Partitions added: Map(myTopic.myTopic.orders-3 -> 26533520, myTopic.myTopic.orders-2 -> 26533730, myTopic.myTopic.orders-4 -> 26533608, myTopic.myTopic.orders-5 -> 26533486)
      20/03/04 13:50:24 WARN KafkaMicroBatchReader: Added partition myTopic.myTopic.orders-3 starts from 26533520 instead of 0. Some data may have been missed.
      20/03/04 13:50:24 WARN KafkaMicroBatchReader: Added partition myTopic.myTopic.orders-2 starts from 26533730 instead of 0. Some data may have been missed.
      20/03/04 13:50:24 WARN KafkaMicroBatchReader: Added partition myTopic.myTopic.orders-4 starts from 26533608 instead of 0. Some data may have been missed.
      20/03/04 13:50:24 WARN KafkaMicroBatchReader: Added partition myTopic.myTopic.orders-5 starts from 26533486 instead of 0. Some data may have been missed.
      
      

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            giltech Richard Gilmore
            Votes:
            2 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: