Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46128

External scheduler cannot be instantiated

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.2, 3.5.0
    • None
    • None

    Description

      Spark submit driver fails to resolve "kubernetes.default.svc" when trying to create executors on newly added worker nodes, however there are no issue with the existing worker nodes.

      Spark versions tried:

      • 3.5.0
      • 3.1.2

      Kubernetes cluster on premises using kubeadm

      • Kubernetes version: v1.28.2
      • OS: Ubuntu 22.04.1 (Jammy)
      • Container Runtime: 1.6.24

      Complete error :

      + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
      + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.95.23 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar
      WARNING: An illegal reflective access operation has occurred
      WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
      WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
      WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
      WARNING: All illegal access operations will be denied in a future release
      23/11/28 01:20:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
      23/11/28 01:20:03 INFO SparkContext: Running Spark version 3.1.2
      23/11/28 01:20:03 INFO ResourceUtils: ==============================================================
      23/11/28 01:20:03 INFO ResourceUtils: No custom resources configured for spark.driver.
      23/11/28 01:20:03 INFO ResourceUtils: ==============================================================
      23/11/28 01:20:03 INFO SparkContext: Submitted application: Spark Pi
      23/11/28 01:20:03 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
      23/11/28 01:20:03 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor
      23/11/28 01:20:03 INFO ResourceProfileManager: Added ResourceProfile id: 0
      23/11/28 01:20:03 INFO SecurityManager: Changing view acls to: 185,root
      23/11/28 01:20:03 INFO SecurityManager: Changing modify acls to: 185,root
      23/11/28 01:20:03 INFO SecurityManager: Changing view acls groups to:
      23/11/28 01:20:03 INFO SecurityManager: Changing modify acls groups to:
      23/11/28 01:20:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(185, root); groups with view permissions: Set(); users  with modify permissions: Set(185, root); groups with modify permissions: Set()
      23/11/28 01:20:04 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
      23/11/28 01:20:04 INFO SparkEnv: Registering MapOutputTracker
      23/11/28 01:20:04 INFO SparkEnv: Registering BlockManagerMaster
      23/11/28 01:20:04 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
      23/11/28 01:20:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
      23/11/28 01:20:04 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
      23/11/28 01:20:04 INFO DiskBlockManager: Created local directory at /var/data/spark-f0634fda-1366-4da1-8ac2-262e4bf9952b/blockmgr-7ac2193b-f7ad-4bc2-bdfa-386d2d3f4bf6
      23/11/28 01:20:04 INFO MemoryStore: MemoryStore started with capacity 413.9 MiB
      23/11/28 01:20:04 INFO SparkEnv: Registering OutputCommitCoordinator
      23/11/28 01:20:04 INFO Utils: Successfully started service 'SparkUI' on port 4040.
      23/11/28 01:20:04 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-pi-96895e8c1382ff30-driver-svc.default.svc:4040
      23/11/28 01:20:04 INFO SparkContext: Added JAR local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar at file:/opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar with timestamp 1701134403914
      23/11/28 01:20:04 WARN SparkContext: The jar local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar has been added already. Overwriting of added jars is not supported in the current version.
      23/11/28 01:20:04 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
      23/11/28 01:20:24 ERROR SparkContext: Error initializing SparkContext.
      org.apache.spark.SparkException: External scheduler cannot be instantiated
          at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2961)
          at org.apache.spark.SparkContext.<init>(SparkContext.scala:557)
          at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2672)
          at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:945)
          at scala.Option.getOrElse(Option.scala:189)
          at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939)
          at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
          at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
          at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
          at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
          at java.base/java.lang.reflect.Method.invoke(Unknown Source)
          at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
          at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
          at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
          at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
          at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
          at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
          at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
          at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [spark-pi-96895e8c1382ff30-driver]  in namespace: [default]  failed.
          at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
          at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:186)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:84)
          at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:75)
          at scala.Option.map(Option.scala:230)
          at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:74)
          at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:123)
          at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2955)
          ... 19 more
      Caused by: java.net.UnknownHostException: kubernetes.default.svc: Temporary failure in name resolution
          at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
          at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source)
          at java.base/java.net.InetAddress.getAddressesFromNameService(Unknown Source)
          at java.base/java.net.InetAddress$NameServiceAddresses.get(Unknown Source)
          at java.base/java.net.InetAddress.getAllByName0(Unknown Source)
          at java.base/java.net.InetAddress.getAllByName(Unknown Source)
          at java.base/java.net.InetAddress.getAllByName(Unknown Source)
          at okhttp3.Dns$1.lookup(Dns.java:40)
          at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:185)
          at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:149)
          at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:84)
          at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:215)
          at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
          at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
          at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:135)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at io.fabric8.kubernetes.client.utils.OIDCTokenRefreshInterceptor.intercept(OIDCTokenRefreshInterceptor.java:41)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:151)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
          at okhttp3.RealCall.execute(RealCall.java:93)
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:490)
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451)
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:416)
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:397)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:933)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:220)
          ... 26 more
      23/11/28 01:20:24 INFO SparkUI: Stopped Spark web UI at http://spark-pi-96895e8c1382ff30-driver-svc.default.svc:4040
      23/11/28 01:20:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
      23/11/28 01:20:24 INFO MemoryStore: MemoryStore cleared
      23/11/28 01:20:24 INFO BlockManager: BlockManager stopped
      23/11/28 01:20:24 INFO BlockManagerMaster: BlockManagerMaster stopped
      23/11/28 01:20:24 WARN MetricsSystem: Stopping a MetricsSystem that is not running
      23/11/28 01:20:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
      23/11/28 01:20:24 INFO SparkContext: Successfully stopped SparkContext
      Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
          at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2961)
          at org.apache.spark.SparkContext.<init>(SparkContext.scala:557)
          at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2672)
          at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:945)
          at scala.Option.getOrElse(Option.scala:189)
          at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939)
          at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
          at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
          at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
          at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
          at java.base/java.lang.reflect.Method.invoke(Unknown Source)
          at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
          at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
          at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
          at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
          at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
          at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
          at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
          at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [spark-pi-96895e8c1382ff30-driver]  in namespace: [default]  failed.
          at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
          at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:186)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:84)
          at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:75)
          at scala.Option.map(Option.scala:230)
          at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:74)
          at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:123)
          at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2955)
          ... 19 more
      Caused by: java.net.UnknownHostException: kubernetes.default.svc: Temporary failure in name resolution
          at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
          at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source)
          at java.base/java.net.InetAddress.getAddressesFromNameService(Unknown Source)
          at java.base/java.net.InetAddress$NameServiceAddresses.get(Unknown Source)
          at java.base/java.net.InetAddress.getAllByName0(Unknown Source)
          at java.base/java.net.InetAddress.getAllByName(Unknown Source)
          at java.base/java.net.InetAddress.getAllByName(Unknown Source)
          at okhttp3.Dns$1.lookup(Dns.java:40)
          at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:185)
          at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:149)
          at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:84)
          at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:215)
          at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
          at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
          at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:135)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at io.fabric8.kubernetes.client.utils.OIDCTokenRefreshInterceptor.intercept(OIDCTokenRefreshInterceptor.java:41)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:151)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
          at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
          at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
          at okhttp3.RealCall.execute(RealCall.java:93)
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:490)
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451)
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:416)
          at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:397)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:933)
          at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:220)
          ... 26 more
      23/11/28 01:20:24 INFO ShutdownHookManager: Shutdown hook called
      23/11/28 01:20:24 INFO ShutdownHookManager: Deleting directory /var/data/spark-f0634fda-1366-4da1-8ac2-262e4bf9952b/spark-0190347a-61ed-45b3-bddc-d0a92db7bcc8
      23/11/28 01:20:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-7ca3d253-94b6-442c-b557-f4270c3d12ce

       

      Similar issue: https://issues.apache.org/jira/browse/SPARK-29640

      Attachments

        Activity

          People

            Unassigned Unassigned
            prakashgurung123 prakash gurung
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: