Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-32212

Job restarting indefinitely after an IllegalStateException from BlobLibraryCacheManager

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.16.1
    • None
    • Runtime / Task
    • None
    • Apache Flink Kubernetes Operator 1.4

    Description

      After running for a few hours the job starts to throw IllegalStateException and I can't figure out why. To restore the job, I need to manually delete the FlinkDeployment to be recreated and redeploy everything.
      The jar is built-in into the docker image, hence is defined accordingly with the Operator's documentation:

      // jarURI: local:///opt/flink/usrlib/my-job.jar 

      I've tried to move it into /opt/flink/lib/my-job.jar but it didn't work either. 

       

      // Source: my-topic (1/2)#30587 (b82d2c7f9696449a2d9f4dc298c0a008_bc764cd8ddf7a0cff126f51c16239658_0_30587) switched from DEPLOYING to FAILED with failure cause: java.lang.IllegalStateException: The library registration references a different set of library BLOBs than previous registrations for this job:
      old:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-7237ecbb12b0b021934b0c81aef78396]
      new:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-943737c6790a3ec6870cecd652b956c2]
          at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:419)
          at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:359)
          at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:235)
          at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:202)
          at org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:336)
          at org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1024)
          at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:612)
          at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
          at java.base/java.lang.Thread.run(Unknown Source) 

      If there is any other information that can help to identify the problem, please let me know.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            mtfelisb Matheus Felisberto
            Votes:
            4 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated: