Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
Description
Sometimes when a container fails, it can be pretty hard to figure out why it failed.
My proposal is that if a container fails, we collect information about the container local dir and dump it into the container log dir. Ideally, I'd like to tar up the directory entirely, but I'm not sure of the security and space implications of such a approach. At the very least, we can list all the files in the container local dir, and dump the contents of launch_container.sh(into the container log dir).
When log aggregation occurs, all this information will automatically get collected and make debugging such failures much easier.
Attachments
Attachments
Issue Links
- duplicates
-
YARN-3755 Log the command of launching containers
- Resolved