Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-13563

FindCoordinatorFuture never get cleared in non-group mode( consumer#assign)

    XMLWordPrintableJSON

Details

    Description

      In KAFKA-10793, we fix the race condition when lookup coordinator by clearing the findCoordinatorFuture when handling the result, rather than in the listener callbacks. It works well under consumer group mode (i.e. Consumer#subscribe), but we found when user is using non consumer group mode (i.e. Consumer#assign) with group id provided (for offset commitment, so that there will be consumerCoordinator created), the findCoordinatorFuture will never be cleared in some situations, and cause the offset committing keeps getting NOT_COORDINATOR error.

       

      After KAFKA-10793, we clear the findCoordinatorFuture in 2 places:

      1. heartbeat thread
      2. AbstractCoordinator#ensureCoordinatorReady

      But in non consumer group mode with group id provided, there will be no (1)heartbeat thread , and it only call (2)AbstractCoordinator#ensureCoordinatorReady when 1st time consumer wants to fetch committed offset position. That is, after 2nd lookupCoordinator call, we have no chance to clear the findCoordinatorFuture .

       

      To avoid the race condition as KAFKA-10793 mentioned, it's not safe to clear the findCoordinatorFuture in the future listener. So, I think we can fix this issue by calling AbstractCoordinator#ensureCoordinatorReady when coordinator unknown in non consumer group case, under each Consumer#poll.

       

      Reproduce steps:
       
      1. Start a 3 Broker cluster with a Topic having Replicas=3.
      2. Start a Client with Producer and Consumer (with Consumer#assign(), not subscribe, and provide a group id) communicating over the Topic.
      3. Stop the Broker that is acting as the Group Coordinator.
      4. Observe successful Rediscovery of new Group Coordinator.
      5. Restart the stopped Broker.
      6. Stop the Broker that became the new Group Coordinator at step 4.
      7. Observe "Rediscovery will be attempted" message but no "Discovered group coordinator" message.
       
       

       

      Attachments

        1. kafka.zip
          55 kB
          James Olsen

        Issue Links

          Activity

            People

              showuon Luke Chen
              showuon Luke Chen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: