Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5078 [Umbrella] NodeManager health checker improvements
  3. YARN-5644

Define exit code for allowing NodeManager health script to mar

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.0.0-alpha2
    • None
    • nodemanager

    Description

      Done as a alternate design to YARN-5567. Define a specific exit code for the health checker script (property yarn.nodemanager.health-checker.script.path) that allows the node to be blacklisted.

      As discussed in the latter part of YARN-5567, the current design requirements are:

      1. Ignore all exit codes from the script
        1. except the newly defined error code which will mark the NodeManager as UNHEALTHY
        2. This allows any syntax or functional errors in the script to be ignored
      2. Upon failure (or multiple recorded failures):
        1. Store the status in the metrics2 state on the NodeManager
        2. Allow the RM to blacklist the NM or allow the jobs to drain

      Attachments

        Issue Links

          Activity

            People

              yufeigu Yufei Gu
              rchiang Ray Chiang
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: