Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-8196

When a node is disconnected due to failing to service a request, upon cluster reconnection it may not participate in leader election

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • None
    • 1.13.0
    • Core Framework
    • None

    Description

      NIFI-7920 fixed a bug that can result in nodes getting the wrong Revision for some components. The fix for that, however, appears to have caused a regression. When a Node is disconnected due to failing to service a replicated API request, such as a component being stopped/started/moved, it will now unregister from leader election for Primary Node / Cluster Coordinator. However, if it then reconnects, it does not re-register for the roles. As a result, we can have a situation where a node disconnects and reconnects and never is able to become Cluster Coordinator. If this happens to all nodes in a cluster, we can end up where no nodes are eligible to become Cluster Coordinator. This results in logs such as:

      2021-02-03 20:14:55,167 WARN [Clustering Tasks Thread-3] o.apache.nifi.controller.FlowController Failed to send heartbeat due to: java.lang.IllegalArgumentException: Cannot send heartbeat to address []. Address must be in <hostname>:<port> format 

      And errors in the UI stating:

      Action cannot be performed because there is currently no Cluster Coordinator elected. The request should be tried again after a moment, after a Cluster Coordinator has been automatically elected.. Returning Service Unavailable response. 

      At this point, there will never be a cluster coordinator until nodes are restarted.

      Attachments

        Issue Links

          Activity

            People

              markap14 Mark Payne
              markap14 Mark Payne
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m