In my experience, it is very common that a MR job completely fails because a single Mapper/Reducer container is using more memory than has been reserved in YARN. The following message is logging the the MapReduce ApplicationMaster:
In this case, the container is re-launched on another node, and of course, it is killed again for the same reason. This process happens three (maybe four?) times before the entire MapReduce job fails. It's often said that the definition of insanity is doing the same thing over and over and expecting different results.
For all intents and purposes, the amount of resources requested by Mappers and Reducers is a fixed amount; based on the default configuration values. Users can set the memory on a per-job basis, but it's a pain, not exact, and requires intimate knowledge of the MapReduce framework and its memory usage patterns.
I propose that if the MR ApplicationMaster detects that a container is killed because of this specific memory resource constraint, that it requests a larger container for the subsequent task attempt.
For example, increase the requested memory size by 50% each time the container fails and the task is retried. This will prevent many Job failures and allow for additional memory tuning, per-Job, after the fact, to get better performance (v.s. fail/succeed).