Description
ShuffleScheduler::killSelf kills the current attempt when it encounters certain errors. As a part of cleanup, it invokes close which internally releases the resources.
If merge is happening in the middle, it could throw the following exception. This is caught in RunShuffleCallable and reported to AM immediately. This causes tasks to fail.
» Error: Error while running task ( failure ) : org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: Error while doing final merge at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:320) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1211) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1265) at java.util.AbstractCollection.toArray(AbstractCollection.java:141) at java.util.ArrayList.addAll(ArrayList.java:577) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:636) at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:316) ... 6 more
When isShutDown is set to true, it would be good to avoid sending error messages to AM.