Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.1
-
None
-
None
Description
The AMReporter of LLAP throws RuntimExceptions from within addTaskAttempt and removeTaskAttempt. These can cause LLAP to come down.
As an interims fix (see HIVE-22113), the RuntimeException of removeTaskAttemp is caught from within TaskRunnerCallable, preventing LLAP termination if a killed task is not found in AMReporter.
Ideally, we would just log this on removeTask (a gone task is a gone task) and have a checked exception in addTaskAttempt. If the checkedException is caught, we should fail the task attempt (as there is already an attempt with this ID running).