Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28483

Canceling a spark job using barrier mode but barrier tasks do not exit

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.3
    • 3.0.0
    • Spark Core
    • None

    Description

      Reproduce code:

      import time
      from pyspark import BarrierTaskContext
      
      n = 4
      
      def  task(x):
        context = BarrierTaskContext.get()
        this = next(x)
        if (this % 2 == 0):
          time.sleep(10000)
        context.barrier()
        return []
      
      sc.setLogLevel("INFO")
      sc.parallelize(list(range(n)), n).barrier().mapPartitions(task).collect()

      Run above code in pyspark shell and then print Ctrl + C to exit the job.

      Get logging like:

      19/02/05 01:07:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0) has entered the global sync, current barrier epoch is 0.
      19/02/05 01:07:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0) has entered the global sync, current barrier epoch is 0.
      19/02/05 01:07:47 INFO Executor: Executor is trying to kill task 2.0 in stage 0.0 (TID 2), reason: Stage cancelled
      19/02/05 01:07:47 INFO Executor: Executor is trying to kill task 0.0 in stage 0.0 (TID 0), reason: Stage cancelled
      19/02/05 01:07:47 INFO Executor: Executor is trying to kill task 1.0 in stage 0.0 (TID 1), reason: Stage cancelled
      19/02/05 01:07:47 INFO Executor: Executor is trying to kill task 3.0 in stage 0.0 (TID 3), reason: Stage cancelled
      19/02/05 01:07:50 WARN PythonRunner: Incomplete task 0.0 in stage 0 (TID 0) interrupted: Attempting to kill Python Worker
      19/02/05 01:07:50 INFO Executor: Executor killed task 0.0 in stage 0.0 (TID 0), reason: Stage cancelled
      19/02/05 01:07:50 WARN PythonRunner: Incomplete task 3.3 in stage 0 (TID 3) interrupted: Attempting to kill Python Worker
      19/02/05 01:07:50 INFO Executor: Executor killed task 3.0 in stage 0.0 (TID 3), reason: Stage cancelled
      19/02/05 01:07:50 WARN PythonRunner: Incomplete task 2.2 in stage 0 (TID 2) interrupted: Attempting to kill Python Worker
      19/02/05 01:07:50 INFO Executor: Executor killed task 2.0 in stage 0.0 (TID 2), reason: Stage cancelled
      19/02/05 01:07:50 WARN PythonRunner: Incomplete task 1.1 in stage 0 (TID 1) interrupted: Attempting to kill Python Worker
      19/02/05 01:07:50 INFO Executor: Executor killed task 1.0 in stage 0.0 (TID 1), reason: Stage cancelled
      19/02/05 01:08:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0) waiting under the global sync since 1549328862443, has been waiting for 60 seconds, current barrier epoch is 0.
      19/02/05 01:08:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0) waiting under the global sync since 1549328862522, has been waiting for 60 seconds, current barrier epoch is 0.
      19/02/05 01:09:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0) waiting under the global sync since 1549328862443, has been waiting for 120 seconds, current barrier epoch is 0.
      19/02/05 01:09:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0) waiting under the global sync since 1549328862522, has been waiting for 120 seconds, current barrier epoch is 0.
      19/02/05 01:10:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0) waiting under the global sync since 1549328862443, has been waiting for 180 seconds, current barrier epoch is 0.
      19/02/05 01:10:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0) waiting under the global sync since 1549328862522, has been waiting for 180 seconds, current barrier epoch is 0.
      19/02/05 01:11:42 INFO BarrierTaskContext: Task 3 from Stage 0(Attempt 0) waiting under the global sync since 1549328862443, has been waiting for 240 seconds, current barrier epoch is 0.
      19/02/05 01:11:42 INFO BarrierTaskContext: Task 1 from Stage 0(Attempt 0) waiting under the global sync since 1549328862522, has been waiting for 240 seconds, current barrier epoch is 0.
      

      Attachments

        Activity

          People

            weichenxu123 Weichen Xu
            weichenxu123 Weichen Xu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: