[SPARK-47488] Driver stuck when thread pool is not shut down - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 4.0.0
Fix Version/s: None
Component/s: Kubernetes
Labels:
- pull-request-available

Description

The app example：

object SparkTest {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder()
      .appName("zt-test")
      .config("spark.logConf",true)
      .getOrCreate()
    spark.sparkContext.setLogLevel("INFO")
    val threadPool = Executors.newFixedThreadPool(5)

    for (i <- 0 until 10) {
      threadPool.execute(new Task("Task " + i))
      // do not shut down threadPool
    }
    val rdd = spark.sparkContext.makeRDD(Seq(1,2,4))
    val res = rdd.collect()
    // throw 
    throw new Throwable()
  }
}

class Task(private var name: String) extends Runnable {
  override def run(): Unit = {
    System.out.println("Executing task: " + name + " by " + Thread.currentThread.getName)
  }
}

when app is running on yarn with cluster mode, even if thread pool is not closed, driver will shut down, which can not lead container not to stop. However, when running on k8s, if thread pool is not closed, the driver pod will be stuck, and will not release resource.

With yarn-cluster mode, the ApplicationMaster wrapped with 'System.exit', like this

ugi.doAs(new PrivilegedExceptionAction[Unit]() {
  override def run(): Unit = System.exit(master.run()) 
})

so, when threads are parking, exitcode can also be passed to System#exit. In this sutiation, AM can stop.

When driver is on k8s with client mode, if encounters exception and thread pool is not closed, driver pod may stuck.

Attachments

Issue Links

links to

GitHub Pull Request #45667

Activity

People

Assignee:: Unassigned

Reporter:: Zhou Tong

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/Mar/24 12:21

Updated:: 24/Mar/24 00:23