Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-5279

Paragraphs terminate after 1 hour

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.2
    • None
    • Interpreters, pySpark, spark
    • None

    Description

      I am using Zeppelin 0.8.2 in a docker container running Ubuntu 16.04

      When running a paragraph using pyspark and am noticing that the runs all stop after exactly 1 hour, after which I get a message at the bottom of the paragraph similar to:

      `Took 1 hrs 0 min 0 sec. Last updated by anonymous at February 09 2021, 10:24:25 PM.`

      Followed by the following error output (see bottom).

      In the background the jobs are still running on spark, however it seems like the pyspark interpreter is losing a connection to the processes after the 1hr.

      I tried editing the `zeppelin.interpreter.lifecyclemanager.timeout.threshold` variable in `zeppelin-site.xml`, but this has no effect on the issue even though the changes are clearly visible in the web ui (screenshot https://i.stack.imgur.com/3T3zOl.png). Other values are set in that xml file that are read and executed by zeppelin. I also verified that the variables defined in `zeppelin-env.sh` were not conflicting with values set in the xml file.

      Paragraph output after timing out:

      org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:274) at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:258) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:233) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter$4.call(RemoteInterpreter.java:229) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterProcess.callRemoteFunction(RemoteInterpreterProcess.java:135) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:228) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:449) at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:315) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

      [1]: https://i.stack.imgur.com/3T3zOl.png

       

      Attachments

        1. zeppelin-interpreter-spark-s-z-0.log
          674 kB
          Noam
        2. zeppelin-s-z-0.log
          9 kB
          Noam

        Activity

          People

            Unassigned Unassigned
            Almog Noam
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: