Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-5598

DeadLock in RemoteInterpreterServer Close / Cancel

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.10.0
    • None
    • zeppelin-interpreter
    • None

    Description

      Using Interpreter Binding Isolated per User + Scoped per Note, we face dead lock sitution during Interpreter Shutdown.

      Unfortunately we can't provide a full thread dump as Interpreter was running in container without jstack. Luckily we found Thread Overview of the Driver Process in the corresponding Spark UI. There are +100 ShutdownThreads which are all blocked as follows:

      Thread 15800	BLOCKED	
      Blocked by Thread 24 
      
      Lock(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
      
      org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$ShutdownThread.run(RemoteInterpreterServer.java:664)
      
      Thread 24	BLOCKED	
      Blocked by Thread 14066 
      
      Lock(org.apache.zeppelin.spark.PySparkInterpreter@554374428})
      Lock(java.util.concurrent.ThreadPoolExecutor$Worker@188315435}), Monitor(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
      
      org.apache.zeppelin.interpreter.LazyOpenInterpreter.close(LazyOpenInterpreter.java:91)
      org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.close(RemoteInterpreterServer.java:487) => holding Monitor(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
      org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1757)
      org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$close.getResult(RemoteInterpreterService.java:1736)
      org.apache.zeppelin.shaded.org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
      org.apache.zeppelin.shaded.org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
      org.apache.zeppelin.shaded.org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:313)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      java.lang.Thread.run(Thread.java:748)
      
      Thread  14066	BLOCKED	
      Blocked by Thread 24 
      
      Lock(org.apache.zeppelin.interpreter.InterpreterGroup@2125781891})
      Monitor(org.apache.zeppelin.spark.PySparkInterpreter@554374428}), Monitor(org.apache.zeppelin.interpreter.LazyOpenInterpreter@1986146112})
      
      org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:292)
      org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:333)
      org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:90)
      org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70) => holding Monitor(org.apache.zeppelin.interpreter.LazyOpenInterpreter@1986146112})
      org.apache.zeppelin.interpreter.LazyOpenInterpreter.cancel(LazyOpenInterpreter.java:118)
      org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.lambda$cancel$1(RemoteInterpreterServer.java:933)
      org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$$Lambda$5415/1522577174.run(Unknown Source)
      java.lang.Thread.run(Thread.java:748) 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            aweise Andreas Weise
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: