Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-2526

Discrepancy between shim cache and core app/task list after scheduler restart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • None
    • None
    • shim - kubernetes
    • None

    Description

      When scheduler restarts, occasionally it gets into a situation where the application is still in Running state despite the application getting terminated in the cluster. This is confirmed with the attached state dump.

       

      The scheduler core logs indicate all nodes are being evaluated for non-existing application (also attached). The CPU is being used up doing this unneeded evaluation.

      Attachments

        1. logs-2be04314-bed0-4385-9ae7-50ed0ef9d9d5.txt.zip
          35 kB
          Shravan Achar
        2. logs-49f01ed0-3473-4521-b11f-80e27adb7250.txt.zip
          79 kB
          Shravan Achar
        3. logs-complete-post.txt.zip
          8.03 MB
          Shravan Achar
        4. log-snippet.txt
          2 kB
          Shravan Achar
        5. logs-since-restart.txt
          132 kB
          Shravan Achar
        6. state-dump-4-1-3.json
          41.00 MB
          Shravan Achar
        7. state-dump-4-17.json.zip
          1.81 MB
          Shravan Achar

        Issue Links

          Activity

            People

              pbacsko Peter Bacsko
              shravan-achar Shravan Achar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: