Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9268

Follow-on: Streams Threads may die from recoverable errors with EOS enabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.0
    • None
    • streams
    • None

    Description

      While testing Streams in EOS mode under frequent and heavy network partitions, I've encountered exceptions leading to thread death in both 2.2 and 2.3 (although different exceptions).

      I believe this problem is addressed in 2.4+ by https://issues.apache.org/jira/browse/KAFKA-9231 , however, if you look at the ticket and corresponding PR, you will see that the solution there introduced some tech debt around UnknownProducerId that needs to be cleaned up. Therefore, I'm not backporting that fix to older branches. Rather, I'm opening a new ticket to make more conservative changes in older branches to improve resilience, if desired.

      These failures are relative rare, so I don't think that a system or integration test could reliably reproduce it. The steps to reproduce would be:
      1. set up a long-running Streams application with EOS enabled (I used three Streams instances)
      2. inject periodic network partitions (I had each Streams instance schedule an interruption at a random time between 0 and 3 hours, then schedule the interruption to last a random duration between 0 and 5 minutes. The interruptions are accomplished by using iptables to drop all traffic to/from all three brokers)

      As far as the actual errors I've observed, I'm attaching the logs of two incidents in which a thread was caused to shut down.

      Attachments

        1. 2.2-eos-failures-2.txt
          140 kB
          John Roesler
        2. 2.2-eos-failures-1.txt
          68 kB
          John Roesler

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vvcephei John Roesler
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: