Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-34347

Kubernetes native resource manager request wrong spec.

    XMLWordPrintableJSON

Details

    Description

      We had a flink spec in which TM cpu is set to 0.5, then we upgraded it to 4.0. We found the job manager requesting both TM with 0.5 CPU and 4 CPU. Most TMs with 0.5 CPU was released soon, however there was 1 TM with 0.5 CPU remained and caused lag in job.

       

      Logs for mixed TM requests:

      2024-02-03 10:10:41,414 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requested worker octopus-16-323-octopus-engine-write-proxy-taskmanager-3-244 with resource spec WorkerResourceSpec {cpuCores=4.0, taskHeapSize=5.637gb (6053219520 bytes), taskOffHeapSize=1024.000mb (1073741824 bytes), networkMemSize=64.000mb (67108864 bytes), managedMemSize=0 bytes, numSlots=4}.02-03 18:10:44.8442024-02-03 10:10:44,844 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=0.5, taskHeapSize=1.137gb (1221381320 bytes), taskOffHeapSize=1024.000mb (1073741824 bytes), networkMemSize=64.000mb (67108864 bytes), managedMemSize=0 bytes, numSlots=4}, current pending count: 1.02-03 18:10:44.9202024-02-03 10:10:44,920 INFO  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Requesting new worker with resource spec WorkerResourceSpec {cpuCores=0.5, taskHeapSize=1.137gb (1221381320 bytes), taskOffHeapSize=1024.000mb (1073741824 bytes), networkMemSize=64.000mb (67108864 bytes), managedMemSize=0 bytes, numSlots=4}, current pending count: 2.02-03 18:10:44.942 

      The name of wrong TM: octopus-16-323-octopus-engine-write-proxy-taskmanager-3-326.

      Relevant logs are attached.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ruibin Ruibin Xing
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: