Uploaded image for project: 'Apache YuniKorn'
  1. Apache YuniKorn
  2. YUNIKORN-1708

Filtered owner references for placeholder pods.

    XMLWordPrintableJSON

Details

    Description

      In AWS EMR on EKS service, the driver real pod's ownerReference is configmap.
      And placeholder's ownerReference is also the driver configmap.

      When user cancels emr-containers job, the job-submitter is terminated,
      but the placeholder still remains in pending state.
      https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks.html

       

      Environment

      • EKS 1.22
      • EMR 6.9 release (Spark 3.3.0)
      • Yunikorn 1.2
      • gang scheduling enabled

       

      placeholders event log

      Unable to find source-code formatter for language: shell. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yamlEvents:
        Type     Reason              Age                   From       Message
        ----     ------              ----                  ----       -------
        Normal   Scheduling          19m                   yunikorn   namespace/tg-driver-spark-000000031ttjn13iom3-0 is queued and waiting for allocation
        Normal   PodUnschedulable    19m                   yunikorn   Task namespace/tg-driver-spark-000000031ttjn13iom3-0 is pending for the requested resources become available
        Warning  FailedProvisioning  19m                   karpenter  Failed to provision new node
      

       

      placeholders spec

      apiVersion: v1
      kind: Pod
      metadata:
        name: tg-driver-spark-000000031tu35ohgkc6-0
        namespace: namespace
        uid: 80601a03-565c-4d0e-88c7-8c66b590871e
        resourceVersion: '546358515'
        creationTimestamp: '2023-04-26T15:06:06Z'
        labels:
          applicationId: spark-000000031tu35ohgkc6
          placeholder: 'true'
          queue: root.beta
        annotations:
          yunikorn.apache.org/placeholder: 'true'
          yunikorn.apache.org/schedulingPolicyParameters: placeholderTimeoutSeconds=300
          yunikorn.apache.org/task-group-name: driver
          yunikorn.apache.org/task-groups: >-
            [{"name": "driver","minResource":{"cpu":
            "1","memory":"2Gi"},"minMember":1,"nodeSelector":{"karpenter.sh/provisioner-name":"test"}},{"name":
            "executor","minResource":{"cpu":
            "1","memory":"5Gi"},"minMember":1,"nodeSelector":{"karpenter.sh/provisioner-name":"test"}}]
        ownerReferences:
          - apiVersion: batch/v1
            kind: ConfigMap
            name: 000000031tu35ohgkc6-spark-defaults
            uid: a3044750-c8b5-47b4-9efa-81bd4b064798
            controller: false
            blockOwnerDeletion: true
          - manager: k8s_yunikorn_scheduler
            operation: Update
            apiVersion: v1
            time: '2023-04-26T15:06:08Z'
            fieldsType: FieldsV1
            fieldsV1:
              f:status:
                f:conditions:
                  .: {}
                  k:{"type":"PodScheduled"}:
                    .: {}
                    f:lastProbeTime: {}
                    f:lastTransitionTime: {}
                    f:message: {}
                    f:reason: {}
                    f:status: {}
                    f:type: {}
            subresource: status
        selfLink: >-
          /api/v1/namespaces/namespace/pods/tg-driver-spark-000000031tu35ohgkc6-0
      status:
        phase: Pending
        conditions:
          - type: PodScheduled
            status: 'False'
            lastProbeTime: null
            lastTransitionTime: '2023-04-26T15:06:08Z'
            reason: Unschedulable
            message: request is waiting for cluster resources become available
        qosClass: Burstable
      spec:
        volumes:
          - name: kube-api-access-gvxxk
            projected:
              sources:
                - serviceAccountToken:
                    expirationSeconds: 3607
                    path: token
                - configMap:
                    name: kube-root-ca.crt
                    items:
                      - key: ca.crt
                        path: ca.crt
                - downwardAPI:
                    items:
                      - path: namespace
                        fieldRef:
                          apiVersion: v1
                          fieldPath: metadata.namespace
              defaultMode: 420
        containers:
          - name: pause
            image: registry.k8s.io/pause:3.7
            resources:
              requests:
                cpu: '1'
                memory: 2Gi
            volumeMounts:
              - name: kube-api-access-gvxxk
                readOnly: true
                mountPath: /var/run/secrets/kubernetes.io/serviceaccount
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            imagePullPolicy: IfNotPresent
        restartPolicy: Never
        terminationGracePeriodSeconds: 30
        nodeSelector:
          karpenter.sh/provisioner-name: test
        serviceAccountName: default
        serviceAccount: default
        securityContext:
          runAsUser: 1000
          runAsGroup: 3000
        schedulerName: yunikorn
        priority: 0
        preemptionPolicy: PreemptLowerPriority
      

      Attachments

        Issue Links

          Activity

            People

              zhuqi Qi Zhu
              Swalloow Junyoung Park
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: