Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-3704

Scheduler.getJobsRunning() returns finished jobs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • None
    • 0.8.1, 0.9.0
    • zeppelin-zengine
    • None

    Description

      Sometimes, when cron configured with active "After execution stop the interpreter" setting, last paragraphs marks as ABORT with no reason. I found out that reason of this behavior is that Scheduler.getJobsRunning() returns finished jobs. Has anyone ever faced this problem?

      Short log (with additional log info from TinkoffCreditSystems fork):

       INFO [2018-08-10 00:08:00,000] ({DefaultQuartzScheduler_Worker-47} Notebook.java[execute]:945) - Start schedule run note: 2C68U586U, cronExpr:"0 8 0 * * ?"
       INFO [2018-08-10 00:08:00,047] ({pool-2-thread-266} SchedulerFactory.java[jobStarted]:109) - Job 20170814-171621_1685490119 started by scheduler  
       INFO [2018-08-10 00:10:35,387] ({pool-2-thread-266} SchedulerFactory.java[jobFinished]:115) - Job 20170814-171621_1685490119 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-greenplum_pd:user:2C68U586U-shared_session
       INFO [2018-08-10 00:10:35,417] ({pool-2-thread-3838} SchedulerFactory.java[jobStarted]:109) - Job 20180402-171122_400058927 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
       INFO [2018-08-10 00:11:57,428] ({pool-2-thread-3838} SchedulerFactory.java[jobFinished]:115) - Job 20180402-171122_400058927 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
       INFO [2018-08-10 00:11:57,445] ({pool-2-thread-996} SchedulerFactory.java[jobStarted]:109) - Job 20180413-191933_1545337614 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
       INFO [2018-08-10 00:11:57,527] ({pool-2-thread-996} NotebookServer.java[afterStatusChange]:2631) - Job 20180413-191933_1545337614 is finished successfully, status: FINISHED
       INFO [2018-08-10 00:11:57,547] ({DefaultQuartzScheduler_Worker-47} Paragraph.java[execute]:343) - skip to run blank paragraph. 20180423-134725_1702290212
       INFO [2018-08-10 00:11:57,547] ({DefaultQuartzScheduler_Worker-47} Notebook.java[execute]:947) - End schedule run note: 2C68U586U
       INFO [2018-08-10 00:11:57,548] ({DefaultQuartzScheduler_Worker-47} ManagedInterpreterGroup.java[close]:100) - Close Session: shared_session for interpreter setting: spark
       INFO [2018-08-10 00:11:57,553] ({pool-2-thread-996} VFSNotebookRepo.java[save]:196) - Saving note:2C68U586U
      
      	Third job status from FINISHED becomes ABORT 
      
       WARN [2018-08-10 00:11:57,555] ({DefaultQuartzScheduler_Worker-47} NotebookServer.java[afterStatusChange]:2633) - Job 20180413-191933_1545337614 is finished, status: ABORT, exception: null, result: %text 'sometext'
       INFO [2018-08-10 00:11:57,577] ({pool-2-thread-996} SchedulerFactory.java[jobFinished]:115) - Job 20180413-191933_1545337614 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark:user:2C68U586U-shared_session
       INFO [2018-08-10 00:11:57,585] ({DefaultQuartzScheduler_Worker-47} ManagedInterpreterGroup.java[close]:130) - Job paragraph_1523636373190_-1466164905 aborted 
      

      Full log with debug messages:

       INFO [2018-08-10 17:31:37,193] ({pool-2-thread-123} NotebookServer.java[afterStatusChange]:2631) - Job 20180810-124513_1104099490 is finished successfully, status: FINISHED
       INFO [2018-08-10 17:31:37,215] ({pool-2-thread-123} VFSNotebookRepo.java[save]:196) - Saving note:2DNHBQ5N2
       INFO [2018-08-10 17:31:37,216] ({pool-2-thread-123} SchedulerFactory.java[jobFinished]:115) - Job 20180810-124513_1104099490 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark::2DNHBQ5N2-shared_session
       INFO [2018-08-10 17:31:37,228] ({pool-2-thread-131} SchedulerFactory.java[jobStarted]:109) - Job 20180810-132950_1064210956 started by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark::2DNHBQ5N2-shared_session
       INFO [2018-08-10 17:31:37,229] ({pool-2-thread-131} Paragraph.java[jobRun]:380) - Run paragraph [paragraph_id: 20180810-132950_1064210956, interpreter: spark.pyspark, note_id: 2DNHBQ5N2, user: user1]
       INFO [2018-08-10 17:31:38,207] ({pool-2-thread-35} Paragraph.java[jobRun]:460) - End of Run paragraph [paragraph_id: 20180810-124513_1104099490, interpreter: spark.pyspark, note_id: 2DNHBQ5N2, user: user1]
       INFO [2018-08-10 17:31:38,207] ({pool-2-thread-35} SchedulerFactory.java[jobFinished]:115) - Job 20180810-124513_1104099490 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark::2DNHBQ5N2-shared_session
       INFO [2018-08-10 17:31:38,224] ({pool-2-thread-131} Paragraph.java[jobRun]:460) - End of Run paragraph [paragraph_id: 20180810-132950_1064210956, interpreter: spark.pyspark, note_id: 2DNHBQ5N2, user: user1]
       INFO [2018-08-10 17:31:38,227] ({pool-2-thread-131} NotebookServer.java[afterStatusChange]:2631) - Job 20180810-132950_1064210956 is finished successfully, status: FINISHED
       INFO [2018-08-10 17:31:38,229] ({MyScheduler_Worker-5} Paragraph.java[execute]:343) - skip to run blank paragraph. 20180810-133022_784315150
       INFO [2018-08-10 17:31:38,230] ({MyScheduler_Worker-5} Notebook.java[execute]:947) - End schedule run note: 2DNHBQ5N2
       INFO [2018-08-10 17:31:38,230] ({MyScheduler_Worker-5} ManagedInterpreterGroup.java[close]:102) - Close Session: shared_session for interpreter setting: spark
       INFO [2018-08-10 17:31:38,230] ({MyScheduler_Worker-5} RemoteScheduler.java[getJobsRunning]:135) - 
      [DEBUG]
      		RemoteScheduler adds paragraph_1533896990379_-679637373 to running list, job status is FINISHED
      [DEBUG]
      
       INFO [2018-08-10 17:31:38,231] ({MyScheduler_Worker-5} ManagedInterpreterGroup.java[close]:132) - 
      [DEBUG]
      	job paragraph_1533896990379_-679637373 is instanceof paragraph
      [DEBUG]
      
       INFO [2018-08-10 17:31:38,231] ({MyScheduler_Worker-5} ManagedInterpreterGroup.java[close]:133) - 
      [DEBUG]
      	Job description before aborting:
      
      		ParagraphId: 20180810-132950_1064210956
      		Status: FINISHED
      		Config: {colWidth=12.0, fontSize=9.0, enabled=true, results={}, editorSetting={language=python, editOnDblClick=false, completionKey=TAB, completionSupport=true}, editorMode=ace/mode/python, editorHide=false, tableHide=true}
      		Json: {
        "text": "%spark.pyspark\nimport os\nimport shutil \n\nshutil.copy2(\u0027/home/egklimov/IdeaProjects/tcs-zeppelin/logs/zeppelin-egklimov-ubuntu013.log\u0027, \u0027/home/egklimov/IdeaProjects/tcs-zeppelin/logs/tmp/zeppelin-egklimov-ubuntu013.log\u0027)",
        "user": "user1",
        "dateUpdated": "2018-08-10 16:01:58.663",
        "config": {
          "colWidth": 12.0,
          "fontSize": 9.0,
          "enabled": true,
          "results": {},
          "editorSetting": {
            "language": "python",
            "editOnDblClick": false,
            "completionKey": "TAB",
            "completionSupport": true
          },
          "editorMode": "ace/mode/python",
          "editorHide": false,
          "tableHide": true
        },
        "settings": {
          "params": {},
          "forms": {}
        },
        "results": {
          "code": "SUCCESS",
          "msg": [
            {
              "type": "TEXT",
              "data": "\u0027/home/egklimov/IdeaProjects/tcs-zeppelin/logs/tmp/zeppelin-egklimov-ubuntu013.log\u0027\n"
            }
          ]
        },
        "apps": [],
        "jobName": "paragraph_1533896990379_-679637373",
        "id": "20180810-132950_1064210956",
        "dateCreated": "2018-08-10 13:29:50.379",
        "dateStarted": "2018-08-10 17:31:37.229",
        "dateFinished": "2018-08-10 17:31:38.225",
        "status": "FINISHED",
        "progressUpdateIntervalMs": 500
      }  
      [DEBUG]
      
       INFO [2018-08-10 17:31:38,253] ({pool-2-thread-131} VFSNotebookRepo.java[save]:196) - Saving note:2DNHBQ5N2
       INFO [2018-08-10 17:31:38,254] ({pool-2-thread-131} SchedulerFactory.java[jobFinished]:115) - Job 20180810-132950_1064210956 finished by scheduler org.apache.zeppelin.interpreter.remote.RemoteInterpreter-spark::2DNHBQ5N2-shared_session
       WARN [2018-08-10 17:31:38,262] ({MyScheduler_Worker-5} NotebookServer.java[afterStatusChange]:2633) - Job 20180810-132950_1064210956 is finished, status: ABORT, exception: null, result: %text '/home/egklimov/IdeaProjects/tcs-zeppelin/logs/tmp/zeppelin-egklimov-ubuntu013.log'
      
       INFO [2018-08-10 17:31:38,275] ({MyScheduler_Worker-5} VFSNotebookRepo.java[save]:196) - Saving note:2DNHBQ5N2
       INFO [2018-08-10 17:31:38,276] ({MyScheduler_Worker-5} ManagedInterpreterGroup.java[close]:165) - Job paragraph_1533896990379_-679637373 aborted 
       INFO [2018-08-10 17:31:38,277] ({MyScheduler_Worker-5} ManagedInterpreterGroup.java[close]:167) - 
      [DEBUG]
      
      	Job description after aborting:
      
      		ParagraphId: 20180810-132950_1064210956
      		Status: ABORT
      		Config: {colWidth=12.0, fontSize=9.0, enabled=true, results={}, editorSetting={language=python, editOnDblClick=false, completionKey=TAB, completionSupport=true}, editorMode=ace/mode/python, editorHide=false, tableHide=true}
      		Json: {
        "text": "%spark.pyspark\nimport os\nimport shutil \n\nshutil.copy2(\u0027/home/egklimov/IdeaProjects/tcs-zeppelin/logs/zeppelin-egklimov-ubuntu013.log\u0027, \u0027/home/egklimov/IdeaProjects/tcs-zeppelin/logs/tmp/zeppelin-egklimov-ubuntu013.log\u0027)",
        "user": "user1",
        "dateUpdated": "2018-08-10 16:01:58.663",
        "config": {
          "colWidth": 12.0,
          "fontSize": 9.0,
          "enabled": true,
          "results": {},
          "editorSetting": {
            "language": "python",
            "editOnDblClick": false,
            "completionKey": "TAB",
            "completionSupport": true
          },
          "editorMode": "ace/mode/python",
          "editorHide": false,
          "tableHide": true
        },
        "settings": {
          "params": {},
          "forms": {}
        },
        "results": {
          "code": "SUCCESS",
          "msg": [
            {
              "type": "TEXT",
              "data": "\u0027/home/egklimov/IdeaProjects/tcs-zeppelin/logs/tmp/zeppelin-egklimov-ubuntu013.log\u0027\n"
            }
          ]
        },
        "apps": [],
        "jobName": "paragraph_1533896990379_-679637373",
        "id": "20180810-132950_1064210956",
        "dateCreated": "2018-08-10 13:29:50.379",
        "dateStarted": "2018-08-10 17:31:37.229",
        "dateFinished": "2018-08-10 17:31:38.225",
        "status": "ABORT",
        "progressUpdateIntervalMs": 500
      }
      [DEBUG]
      

      Attachments

        Issue Links

          Activity

            People

              egorklimov George Klimov
              egorklimov George Klimov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: