PySpark worker daemon reads from stdin the worker PIDs to kill. https://github.com/apache/spark/blob/1bb60ab8392adf8b896cc04fb1d060620cf09d8a/python/pyspark/daemon.py#L127
However, the worker process is a forked process from the worker daemon process and we didn't close stdin on the child after fork. This means the child and user program can read stdin as well, which blocks daemon from receiving the PID to kill. This can cause issues because the task reaper might detect the task was not terminated and eventually kill the JVM.
Possible fix could be:
- Closing stdin of the worker process right after fork.
- Creating a new socket to receive PIDs to kill instead of using stdin.
- Paste the following code in pyspark:
- Press CTRL+C to cancel the job.
- The following message is displayed:
- Run ps -xf to see that cat process was in fact not killed: