Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-28699

Cache an indeterminate RDD could lead to incorrect result while stage rerun

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.3.3, 2.4.3, 3.0.0
    • 2.3.4, 2.4.4, 3.0.0
    • Spark Core

    Description

      It's another case for the indeterminate stage/RDD rerun while stage rerun happened.

      We can reproduce this by the following code, thanks to Tyson for reporting this!
       

      import scala.sys.process._
      import org.apache.spark.TaskContext
      
      val res = spark.range(0, 10000 * 10000, 1).map{ x => (x % 1000, x)}
      // kill an executor in the stage that performs repartition(239)
      val df = res.repartition(113).cache.repartition(239).map { x =>
       if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 1 && TaskContext.get.stageAttemptNumber == 0) {
       throw new Exception("pkill -f -n java".!!)
       }
       x
      }
      
      val r2 = df.distinct.count()
      

      Attachments

        Issue Links

          Activity

            People

              XuanYuan Yuanjian Li
              XuanYuan Yuanjian Li
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: