Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-5897

Spark-Interpreter context change

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.11.0
    • spark
    • None

    Description

      I have encountered some strange behaviour in the Spark interpreter. This problem occurs when several cron jobs are started in parallel.

      The launch command looks quite good.

      [INFO] Interpreter launch command: /opt/conda/lib/python3.9/site-packages/pyspark/bin/spark-submit --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer --driver-class-path /usr/share/java/*:/tmp/local-repo/spark_8g_8g/*:/opt/zeppelin/interpreter/spark/*:::/opt/zeppelin/interpreter/zeppelin-interpreter-shaded-0.11.0-SNAPSHOT.jar:/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar --driver-java-options   -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///opt/zeppelin/conf/log4j.properties -Dlog4j.configurationFile=file:///opt/zeppelin/conf/log4j2.properties -Dzeppelin.log.file=/opt/zeppelin/logs/zeppelin-interpreter-spark_8g_8g-isolated-2G8V2J18D-2023-04-11_00-00-00--spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren.log --conf spark.driver.maxResultSize=8g --conf spark.kubernetes.executor.request.cores=0. --conf spark.network.timeout=1800 --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog --verbose --conf spark.jars.ivySettings=/opt/spark/ivysettings.xml --proxy-user ejavaheri --conf spark.master=k8s://https://kubernetes.default.svc --conf spark.driver.memory=8g --conf spark.driver.cores=2 --conf spark.app.name=spark_8g_8g --conf spark.driver.host=spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren.spark.svc --conf spark.kubernetes.memoryOverheadFactor=0.4 --conf spark.webui.yarn.useProxy=false --conf spark.blockManager.port=22322 --conf spark.driver.port=22321 --conf spark.driver.bindAddress=0.0.0.0 --conf spark.kubernetes.namespace=spark --conf spark.kubernetes.driver.request.cores=200m --conf spark.kubernetes.driver.pod.name=spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren --conf spark.executor.instances=1 --conf spark.executor.memory=8g --conf spark.executor.cores=4 --conf spark.submit.deployMode=client --conf spark.kubernetes.container.image=harbor.mycompany.com/dap/zeppelin-executor:3.3 /opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar zeppelin-server.spark.svc 12320 spark_8g_8g-isolated-2G8V2J18D-2023-04-11_00-00-00 12321:12321

       

      As you can see the config value `spark.driver.host` is `spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren.spark.svc`, which is correct

      During start-up, the host seems to change. New name:

      spark2g4g-isolated-2d8reueys-2023-04-1100-00-00-fbvrgw.spark.svc 

      The new name is the host name of the other parallel running cron job. How is it possible that the spark driver host changes? Does Zeppelin even have the possibility to do this?

      
      INFO [2023-04-11 00:00:04,288] ({RegisterThread} RemoteInterpreterServer.java[run]:620) - Start registration
      INFO [2023-04-11 00:00:04,288] ({RemoteInterpreterServer-Thread} RemoteInterpreterServer.java[run]:200) - Launching ThriftServer at 10.129.4.191:12321
      INFO [2023-04-11 00:00:05,409] ({RegisterThread} RemoteInterpreterServer.java[run]:634) - Registering interpreter process
      INFO [2023-04-11 00:00:05,433] ({RegisterThread} RemoteInterpreterServer.java[run]:636) - Registered interpreter process
      INFO [2023-04-11 00:00:05,433] ({RegisterThread} RemoteInterpreterServer.java[run]:657) - Registration finished
      WARN [2023-04-11 00:00:05,517] ({pool-3-thread-1} ZeppelinConfiguration.java[<init>]:87) - Failed to load XML configuration, proceeding with a default,for a stacktrace activate the debug log
      INFO [2023-04-11 00:00:05,522] ({pool-3-thread-1} ZeppelinConfiguration.java[create]:137) - Server Host: 127.0.0.1
      INFO [2023-04-11 00:00:05,523] ({pool-3-thread-1} ZeppelinConfiguration.java[create]:144) - Zeppelin Version: 0.11.0-SNAPSHOT
      INFO [2023-04-11 00:00:05,522] ({pool-3-thread-1} ZeppelinConfiguration.java[create]:141) - Server Port: 8080
      INFO [2023-04-11 00:00:05,523] ({pool-3-thread-1} ZeppelinConfiguration.java[create]:143) - Context Path: /
      INFO [2023-04-11 00:00:05,531] ({pool-3-thread-1} RemoteInterpreterServer.java[createLifecycleManager]:293) - Creating interpreter lifecycle manager: org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager
      INFO [2023-04-11 00:00:05,535] ({pool-3-thread-1} RemoteInterpreterServer.java[init]:236) - Creating RemoteInterpreterEventClient with connection pool size: 100
      INFO [2023-04-11 00:00:05,535] ({pool-3-thread-1} TimeoutLifecycleManager.java[onInterpreterProcessStarted]:73) - Interpreter process: spark_8g_8g-isolated-2G8V2J18D-2023-04-11_00-00-00 is started
      INFO [2023-04-11 00:00:05,535] ({pool-3-thread-1} TimeoutLifecycleManager.java[<init>]:67) - TimeoutLifecycleManager is started with checkInterval: 60000, timeoutThreshold: ¸3600000
      INFO [2023-04-11 00:00:05,627] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter, isForceShutdown: true
      INFO [2023-04-11 00:00:05,635] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter, isForceShutdown: true
      INFO [2023-04-11 00:00:05,645] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter, isForceShutdown: true
      INFO [2023-04-11 00:00:05,655] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.IPySparkInterpreter, isForceShutdown: true
      INFO [2023-04-11 00:00:05,663] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkRInterpreter, isForceShutdown: true
      INFO [2023-04-11 00:00:05,670] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkIRInterpreter, isForceShutdown: true
      INFO [2023-04-11 00:00:05,679] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.SparkShinyInterpreter, isForceShutdown: true
      INFO [2023-04-11 00:00:05,753] ({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate interpreter org.apache.zeppelin.spark.KotlinSparkInterpreter, isForceShutdown: true
      INFO [2023-04-11 00:00:05,806] ({pool-3-thread-1} SchedulerFactory.java[createOrGetFIFOScheduler]:76) - Create FIFOScheduler: interpreter_688737023
      INFO [2023-04-11 00:00:05,806] ({pool-3-thread-1} SchedulerFactory.java[<init>]:56) - Scheduler Thread Pool Size: 100
      INFO [2023-04-11 00:00:05,810] ({FIFOScheduler-interpreter_688737023-Worker-1} AbstractScheduler.java[runJob]:127) - Job 20210622-101638_112853005 started by scheduler interpreter_688737023
      INFO [2023-04-11 00:00:05,818] ({pool-3-thread-2} SchedulerFactory.java[createOrGetFIFOScheduler]:76) - Create FIFOScheduler: interpreter_839216362
      INFO [2023-04-11 00:00:05,818] ({pool-3-thread-2} SchedulerFactory.java[createOrGetParallelScheduler]:88) - Create ParallelScheduler: org.apache.zeppelin.spark.SparkSqlInterpreter1135593921 with maxConcurrency: 10
      INFO [2023-04-11 00:00:05,857] ({FIFOScheduler-interpreter_688737023-Worker-1} SparkInterpreter.java[extractScalaVersion]:279) - Using Scala: version 2.12.15
      INFO [2023-04-11 00:00:05,881] ({FIFOScheduler-interpreter_688737023-Worker-1} SparkScala212Interpreter.scala[createSparkILoop]:182) - Scala shell repl output dir: /tmp/spark16004603505225443508
      INFO [2023-04-11 00:00:06,113] ({FIFOScheduler-interpreter_688737023-Worker-1} SparkScala212Interpreter.scala[createSparkILoop]:191) - UserJars: file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar:/opt/zeppelin/interpreter/spark/scala-2.12/spark-scala-2.12-0.11.0-SNAPSHOT.jar
      INFO [2023-04-11 00:00:11,260] ({FIFOScheduler-interpreter_688737023-Worker-1} HiveConf.java[findConfigFile]:187) - Found configuration file file:/opt/conda/lib/python3.9/site-packages/pyspark/conf/hive-site.xml
      INFO [2023-04-11 00:00:11,438] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Running Spark version 3.3.0
      INFO [2023-04-11 00:00:11,472] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - No custom resources configured for spark.driver.
      INFO [2023-04-11 00:00:11,472] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - ==============================================================
      INFO [2023-04-11 00:00:11,471] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - ==============================================================
      INFO [2023-04-11 00:00:11,473] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Submitted application: spark_8g_8g
      INFO [2023-04-11 00:00:11,500] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 4, script: , vendor: , memory -> name: memory, amount: 8192, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
      INFO [2023-04-11 00:00:11,512] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Limiting resource is cpus at 4 tasks per executor
      INFO [2023-04-11 00:00:11,515] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Added ResourceProfile id: 0
      INFO [2023-04-11 00:00:11,580] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Changing view acls to: zeppelin,ejavaheri
      INFO [2023-04-11 00:00:11,580] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Changing modify acls to: zeppelin,ejavaheri
      INFO [2023-04-11 00:00:11,581] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(zeppelin, ejavaheri); groups with view permissions: Set(); users  with modify permissions: Set(zeppelin, ejavaheri); groups with modify permissions: Set()
      INFO [2023-04-11 00:00:11,581] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Changing modify acls groups to:
      INFO [2023-04-11 00:00:11,581] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Changing view acls groups to:
      INFO [2023-04-11 00:00:11,852] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Successfully started service 'sparkDriver' on port 22321.
      INFO [2023-04-11 00:00:11,880] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Registering MapOutputTracker
      INFO [2023-04-11 00:00:11,912] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Registering BlockManagerMaster
      INFO [2023-04-11 00:00:11,946] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
      INFO [2023-04-11 00:00:11,947] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - BlockManagerMasterEndpoint up
      INFO [2023-04-11 00:00:11,950] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Registering BlockManagerMasterHeartbeat
      INFO [2023-04-11 00:00:11,975] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Created local directory at /tmp/blockmgr-1903d257-be01-4cb7-954f-9a5c13ab0598
      INFO [2023-04-11 00:00:11,993] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - MemoryStore started with capacity 4.6 GiB
      INFO [2023-04-11 00:00:12,010] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Registering OutputCommitCoordinator
      INFO [2023-04-11 00:00:12,079] ({FIFOScheduler-interpreter_688737023-Worker-1} Log.java[initialized]:170) - Logging initialized @9839ms to org.sparkproject.jetty.util.log.Slf4jLog
      INFO [2023-04-11 00:00:12,193] ({FIFOScheduler-interpreter_688737023-Worker-1} Server.java[doStart]:375) - jetty-9.4.46.v20220331; built: 2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18; jvm 11.0.17+8-post-Ubuntu-1ubuntu220.04
      INFO [2023-04-11 00:00:12,223] ({FIFOScheduler-interpreter_688737023-Worker-1} Server.java[doStart]:415) - Started @9983ms
      INFO [2023-04-11 00:00:12,273] ({FIFOScheduler-interpreter_688737023-Worker-1} AbstractConnector.java[doStart]:333) - Started ServerConnector@325be8be{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
      INFO [2023-04-11 00:00:12,274] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Successfully started service 'SparkUI' on port 4040.
      INFO [2023-04-11 00:00:12,310] ({FIFOScheduler-interpreter_688737023-Worker-1} ContextHandler.java[doStart]:921) - Started o.s.j.s.ServletContextHandler@47745fce{/,null,AVAILABLE,@Spark}
      INFO [2023-04-11 00:00:12,342] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Added JAR file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar at spark://spark2g4g-isolated-2d8reueys-2023-04-1100-00-00-fbvrgw.spark.svc:22321/jars/spark-interpreter-0.11.0-SNAPSHOT.jar with timestamp 1681164011433
      INFO [2023-04-11 00:00:12,413] ({FIFOScheduler-interpreter_688737023-Worker-1} Logging.scala[logInfo]:61) - Auto-configuring K8S client using current context from users K8S config file
      

      Attachments

        Issue Links

          Activity

            People

              Reamer Philipp Dallig
              Reamer Philipp Dallig
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: