Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-7136

PushHttpMetricsReporter may deadlock when processing metrics changes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 1.1.0, 2.0.0
    • 1.1.1, 2.0.0
    • metrics
    • None

    Description

      We noticed a deadlock in PushHttpMetricsReporter. Locking for metrics was changed under KAFKA-6765 to avoid NullPointerException in metrics reporters due to concurrent read and updates. PushHttpMetricsReporter requires a lock to process metrics registration that is invoked while holding the sensor lock. It also reads metrics attempting to acquire sensor lock while holding its lock (inverse order). This resulted in the deadlock below.

      Found one Java-level deadlock:
      Java stack information for the threads listed above:
      ===================================================
      "StreamThread-7":
      at org.apache.kafka.tools.PushHttpMetricsReporter.metricChange(PushHttpMetricsReporter.java:144)

      • waiting to lock <0x0000000655a54310> (a java.lang.Object)
        at org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:563)
      • locked <0x0000000655a44a28> (a org.apache.kafka.common.metrics.Metrics)
        at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:236)
      • locked <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
        at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:217)
        at org.apache.kafka.common.network.Selector$SelectorMetrics.maybeRegisterConnectionMetrics(Selector.java:1016)
        at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:462)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:425)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:510)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:271)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:242)
        at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:218)
        at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:274)
        at org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:254)
        at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1820)
        at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1798)
        at org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:224)
        at org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:121)
        at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:74)
        at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:317)
        at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:824)
        at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
        at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)

      "pool-17-thread-1":
      at org.apache.kafka.common.metrics.KafkaMetric.measurableValue(KafkaMetric.java:82)

      • waiting to lock <0x000000065629c170> (a org.apache.kafka.common.metrics.Sensor)
        at org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:58)
        at org.apache.kafka.tools.PushHttpMetricsReporter$HttpReporter.run(PushHttpMetricsReporter.java:177)
      • locked <0x0000000655a54310> (a java.lang.Object)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

      Found 1 deadlock.

      Attachments

        Issue Links

          Activity

            People

              rsivaram Rajini Sivaram
              rsivaram Rajini Sivaram
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: