Details
-
Bug
-
Status: Open
-
Blocker
-
Resolution: Unresolved
-
0.10.1
-
None
-
None
Description
Hola!
We discovered bug while deploying Zeppelin on Kubernetes with Spark Interpreter also in Kubernetes mode (also attached full log to this issue):
INFO [2023-06-09 07:59:49,053] ({FIFOScheduler-interpreter_537057292-Worker-1} PythonInterpreter.java[bootstrapInterpreter]:562) - Bootstrap interpreter via python/zeppelin_pyspark.py ERROR [2023-06-09 07:59:50,416] ({FIFOScheduler-interpreter_537057292-Worker-1} PySparkInterpreter.java[open]:104) - Fail to bootstrap pyspark java.io.IOException: Fail to run bootstrap script: python/zeppelin_pyspark.py %text Fail to execute line 54: sqlc = __zSqlc__ = __zSpark__._wrapped Traceback (most recent call last): File "/tmp/python129188975973677791/zeppelin_python.py", line 162, in <module> exec(code, _zcUserQueryNameSpace) File "<stdin>", line 54, in <module> AttributeError: 'SparkSession' object has no attribute '_wrapped' at org.apache.zeppelin.python.PythonInterpreter.bootstrapInterpreter(PythonInterpreter.java:579) at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:102) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:844) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:752) at org.apache.zeppelin.scheduler.Job.run(Job.java:172) at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132) at org.apache.zeppelin.scheduler.FIFOScheduler.lambda$runJobInScheduler$0(FIFOScheduler.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) INFO [2023-06-09 07:59:50,418] ({FIFOScheduler-interpreter_537057292-Worker-1} PySparkInterpreter.java[close]:112) - Close PySparkInterpreter INFO [2023-06-09 07:59:50,418] ({FIFOScheduler-interpreter_537057292-Worker-1} PythonInterpreter.java[close]:259) - Kill python process INFO [2023-06-09 07:59:50,423] ({FIFOScheduler-interpreter_537057292-Worker-1} AbstractScheduler.java[runJob]:154) - Job 20230605-095507_490419065 finished by scheduler interpreter_537057292 with status ERROR WARN [2023-06-09 07:59:50,425] ({Exec Default Executor} ProcessLauncher.java[onProcessFailed]:134) - Process with cmd [python, /tmp/python129188975973677791/zeppelin_python.py, 10.165.178.231, 41565] is failed due to org.apache.commons.exec.ExecuteException: Process exited with an error: 143 (Exit value: 143) at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404) at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48) at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200) at java.lang.Thread.run(Thread.java:750) INFO [2023-06-09 07:59:50,426] ({Exec Default Executor} ProcessLauncher.java[transition]:109) - Process state is transitioned to TERMINATED
**
Exact problem on Stack Overflow from another user: https://stackoverflow.com/q/75949679
How I configure Spark in the notebook in Zeppelin Server (also see screenshots):
%spark.conf spark.executor.instances 5 spark.kubernetes.container.image.pullSecrets docker-registry spark.app.name SparkTESTSparkTESTSparkTESTSparkTESTSparkTESTSparkTESTSparkTESTSparkTESTSparkTEST spark.jars.ivy /tmp
Then Spark Interpreter starts, then it starts spark Executors (all of them in K8S), then raises the error above.
Executors don't fail after the error, it's just Zeppelin Server can't connect to Interpreter and prints the same error in the notebook cell.
Environment:
- Zeppelin version: 0.10.1
- Spark version: 3.4.0
- Python version: 3.11
- Dockerfile for Zeppelin Interpreter is attached.
What do we do now: we use version 0.11.0-SNAPSHOT and build docker images from Zeppelin sources on latest master in GitHub. In the newest version there is no such bug!
So the questions are:
- Will patch to similar issues be released? If so, then when?
- Can the issue be fixed in the current version?
Thank you! Any help will be appreciated.