Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31149

PySpark job not killing Spark Daemon processes after the executor is killed due to OOM

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 2.4.5
    • None
    • PySpark
    • None

    Description

      2020-03-10 10:15:00,257 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 327523 for container-id container_e25_1583
      485217113_0347_01_000042: 1.9 GB of 2 GB physical memory used; 39.5 GB of 4.2 GB virtual memory used
      2020-03-10 10:15:05,135 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 327523 for container-id container_e25_1583
      485217113_0347_01_000042: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 GB virtual memory used
      2020-03-10 10:15:05,136 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Process tree for container: container_e25_1583485217113_0347_01_000042
       has processes older than 1 iteration running over the configured limit. Limit=2147483648, current usage = 3915513856
      2020-03-10 10:15:05,136 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=327523,containerID=container_e25_1583485217113_0347_01_
      000042] is running beyond physical memory limits. Current usage: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 GB virtual memory used. Killing container.
      Dump of the process-tree for container_e25_1583485217113_0347_01_000042 :
              |- 327535 327523 327523 327523 (java) 1611 111 4044427264 172306 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/bin/java -server -Xmx1024m -Djava.io.tmpdir=/data/s
      cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp -Dspark.ssl.trustStore=/opt/mapr/conf/ssl_truststore -Dspark.authenticat
      e.enableSaslEncryption=true -Dspark.driver.port=40653 -Dspark.network.timeout=7200 -Dspark.ssl.keyStore=/opt/mapr/conf/ssl_keystore -Dspark.network.sasl.serverAlwaysEncrypt=true -Dspark.ssl
      .enabled=true -Dspark.ssl.protocol=TLSv1.2 -Dspark.ssl.fs.enabled=true -Dspark.ssl.ui.enabled=false -Dspark.authenticate=true -Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.
      0/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042 -XX:OnOutOfMemoryError=kill %p org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
      spark://CoarseGrainedScheduler@bd02slse0201.wellsfargo.com:40653 --executor-id 40 --hostname bd02slsc0519.wellsfargo.com --cores 1 --app-id application_1583485217113_0347 --user-class-path
      file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar
      

       

       

      After that, there are lots of pyspark.daemon process left.
      eg:
      /apps/anaconda3-5.3.0/bin/python -m pyspark.daemon

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              indeoo Arsenii Venherak
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: