[YARN-1380] Enable NM to automatically reuse failed local dirs after they are available again - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: nodemanager
Labels:
- features

Description

Currently NM is able to kick bad directories out when they fail, but not able to reuse them if they are fixed. This is inconvenient in large production clusters.
In this jira I propose a patch that I am using in my organization.
It also adds a new metric of the number of failed directories so people have clearer view from outside.

Attachments

Issue Links

is duplicated by

YARN-90 NodeManager should identify failed disks becoming good again

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Hou Song

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 31/Oct/13 07:47

Updated:: 31/Oct/13 19:26

Resolved:: 31/Oct/13 19:26

Time Tracking

Estimated:

48h

Remaining:

48h

Logged:

Not Specified