Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-9374

Worker can be disabled by blocked connectors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1
    • 2.6.0
    • connect
    • None

    Description

      If a connector hangs during any of its initialize, start, stop, taskConfigs, taskClass, version, config, or validate methods, the worker will be disabled for some types of requests thereafter, including connector creation, connector reconfiguration, and connector deletion.
      This only occurs in distributed mode and is due to the threading model used by the DistributedHerder class. This affects both distributed and standalone mode. Distributed herders perform some connector work synchronously in their tick thread, which also handles group membership and some REST requests. The majority of the herder methods for the standalone herder are synchronized, including those for creating, updating, and deleting connectors; as long as one of those methods blocks, all subsequent calls to any of these methods will also be blocked.

       

      One potential solution could be to treat connectors that fail to start, stop, etc. in time similarly to tasks that fail to stop within the task graceful shutdown timeout period by handling all connector interactions on a separate thread, waiting for them to complete within a timeout, and abandoning the thread (and transitioning the connector to the FAILED state, if it has been created at all) if that timeout expires.

      Attachments

        Issue Links

          Activity

            People

              ChrisEgerton Chris Egerton
              ChrisEgerton Chris Egerton
              Konstantine Karantasis Konstantine Karantasis
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: