Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
Description
Currently NM is able to kick bad directories out when they fail, but not able to reuse them if they are fixed. This is inconvenient in large production clusters.
In this jira I propose a patch that I am using in my organization.
It also adds a new metric of the number of failed directories so people have clearer view from outside.
Attachments
Issue Links
- is duplicated by
-
YARN-90 NodeManager should identify failed disks becoming good again
- Closed