Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-1380

Enable NM to automatically reuse failed local dirs after they are available again

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • nodemanager

    Description

      Currently NM is able to kick bad directories out when they fail, but not able to reuse them if they are fixed. This is inconvenient in large production clusters.
      In this jira I propose a patch that I am using in my organization.
      It also adds a new metric of the number of failed directories so people have clearer view from outside.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              thehousong Hou Song
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 48h
                  48h
                  Remaining:
                  Remaining Estimate - 48h
                  48h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified