Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-35145

Add timeout for cluster termination

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.20.0
    • 1.20.0
    • Runtime / Coordination
    • None

    Description

      Currently, cluster termination may be blocked forever as there's no timeout for that. For example, for an Application cluster with ZK HA enabled, when ZK cluster is down, the cluster will reach termination status, but the termination process will be blocked when trying to clean up HA data on ZK, where the ZK client will retry connecting to ZK forever. Similar phenomenon can be observed when an HDFS outage occurs.

      I propose adding a timeout for the cluster termination process in ClusterEntryPoint#
      shutDownAsync method. 

      Attachments

        Activity

          People

            Unassigned Unassigned
            Zhanghao Chen Zhanghao Chen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: