Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.4.8, 3.3.4
-
None
-
None
Description
While running Spark streaming applications on YARN in cluster mode, reboot/shutdown of the node hosting AM causes the application to terminate SparkContext and mark it as SUCCEEDED。
The reboot/shutdown command will send an graceful stop signal of "kill -15" to all processes on this machine. From the Spark Streaming code, this end signal will make the Spark Streaming application think it has ended normally. The log is as follows:
But in most cases, the reboot/shutdown command may occur due to misoperation, other services needing to be restarted, or the operating system itself needing to be restarted. Is it appropriate for Spark to report such a "SUCCEEDED" status?
Especially now, many scheduling systems decide whether to restart the Spark task based on its status, such as the "FAILED" status. Spark streaming reports such a "SUCCEEDED" status, making it difficult for the scheduling system to handle.
Moreover, Spark streaming runs for a long time, and reporting the "SUCCEEDED" status is also very ambiguous.
Â