Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.10.0
-
None
-
None
Description
Using Interpreter Binding Isolated per User + Scoped per Note, we face dead lock sitution during Interpreter Shutdown.
Unfortunately we can't provide a full thread dump as Interpreter was running in container without jstack. Luckily we found Thread Overview of the Driver Process in the corresponding Spark UI. There are +100 ShutdownThreads which are all blocked as follows:
Thread 15800 BLOCKED Blocked by Thread 24 Lock(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891}) org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$ShutdownThread.run(RemoteInterpreterServer.java:664)
Thread 24 BLOCKED Blocked by Thread 14066 Lock(org.apache.zeppelin.spark.PySparkInterpreter@554374428}) Lock(java.util.concurrent.ThreadPoolExecutor$Worker@188315435}), Monitor(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891}) org.apache.zeppelin.interpreter.LazyOpenInterpreter.close(LazyOpenInterpreter.java:91) org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.close(RemoteInterpreterServer.java:487) => holding Monitor(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891}) org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1757) org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1736) org.apache.zeppelin.shaded.org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) org.apache.zeppelin.shaded.org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) org.apache.zeppelin.shaded.org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748)
Thread 14066 BLOCKED Blocked by Thread 24 Lock(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891}) Monitor(org.apache.zeppelin.spark.PySparkInterpreter@554374428}), Monitor(org.apache.zeppelin.interpreter.LazyOpenInterpreter@1986146112}) org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:292) org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:333) org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:90) org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70) => holding Monitor(org.apache.zeppelin.interpreter.LazyOpenInterpreter@1986146112}) org.apache.zeppelin.interpreter.LazyOpenInterpreter.cancel(LazyOpenInterpreter.java:118) org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.lambda$cancel$1(RemoteInterpreterServer.java:933) org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$$Lambda$5415/1522577174.run(Unknown Source) java.lang.Thread.run(Thread.java:748)