Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-3767

Shuffle should not report error to AM during inputContext.killSelf()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0
    • None
    • None

    Description

      ShuffleScheduler::killSelf kills the current attempt when it encounters certain errors. As a part of cleanup, it invokes close which internally releases the resources.

      If merge is happening in the middle, it could throw the following exception. This is caught in RunShuffleCallable and reported to AM immediately. This causes tasks to fail.

      » Error: Error while running task ( failure ) : org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: Error while doing final merge 
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:320)
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
      Caused by: java.util.ConcurrentModificationException
        at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1211)
        at java.util.TreeMap$KeyIterator.next(TreeMap.java:1265)
        at java.util.AbstractCollection.toArray(AbstractCollection.java:141)
        at java.util.ArrayList.addAll(ArrayList.java:577)
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:636)
        at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:316)
        ... 6 more
      

      When isShutDown is set to true, it would be good to avoid sending error messages to AM.

      Attachments

        1. TEZ-3767.1.patch
          3 kB
          Rajesh Balamohan
        2. TEZ-3767.2.patch
          5 kB
          Rajesh Balamohan
        3. TEZ-3767.2.patch
          3 kB
          Rajesh Balamohan
        4. TEZ-3767.3.patch
          5 kB
          Rajesh Balamohan

        Activity

          People

            rajesh.balamohan Rajesh Balamohan
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: