Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-34589

FineGrainedSlotManager doesn't handle errors in the resource reconcilliation step

    XMLWordPrintableJSON

Details

    • Technical Debt
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 1.19.0, 1.18.1, 1.20.0
    • None
    • Runtime / Coordination
    • None

    Description

      I noticed during my work on FLINK-34427 that the reconcilliation is scheduled periodically when starting the SlotManager. But it doesn't handle errors in this step. I see two options here:
      1. Fail fatally because such an error might indicate a major issue with the RM backend.
      2. Log the failure and continue the scheduled task even in case of an error.

      My understanding is that we're just not able to recreate TaskManagers which should be a transient issue and could be resolved in the backend (YARN, k8s). That's why I would lean towards option 2.

      xtsong WDYT?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              mapohl Matthias Pohl
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: