Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.2, 3.5.0
-
None
-
None
Description
Spark submit driver fails to resolve "kubernetes.default.svc" when trying to create executors on newly added worker nodes, however there are no issue with the existing worker nodes.
Spark versions tried:
- 3.5.0
- 3.1.2
Kubernetes cluster on premises using kubeadm
- Kubernetes version: v1.28.2
- OS: Ubuntu 22.04.1 (Jammy)
- Container Runtime: 1.6.24
Complete error :
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.95.23 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int) WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release 23/11/28 01:20:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 23/11/28 01:20:03 INFO SparkContext: Running Spark version 3.1.2 23/11/28 01:20:03 INFO ResourceUtils: ============================================================== 23/11/28 01:20:03 INFO ResourceUtils: No custom resources configured for spark.driver. 23/11/28 01:20:03 INFO ResourceUtils: ============================================================== 23/11/28 01:20:03 INFO SparkContext: Submitted application: Spark Pi 23/11/28 01:20:03 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/11/28 01:20:03 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor 23/11/28 01:20:03 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/11/28 01:20:03 INFO SecurityManager: Changing view acls to: 185,root 23/11/28 01:20:03 INFO SecurityManager: Changing modify acls to: 185,root 23/11/28 01:20:03 INFO SecurityManager: Changing view acls groups to: 23/11/28 01:20:03 INFO SecurityManager: Changing modify acls groups to: 23/11/28 01:20:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(185, root); groups with view permissions: Set(); users with modify permissions: Set(185, root); groups with modify permissions: Set() 23/11/28 01:20:04 INFO Utils: Successfully started service 'sparkDriver' on port 7078. 23/11/28 01:20:04 INFO SparkEnv: Registering MapOutputTracker 23/11/28 01:20:04 INFO SparkEnv: Registering BlockManagerMaster 23/11/28 01:20:04 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/11/28 01:20:04 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/11/28 01:20:04 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/11/28 01:20:04 INFO DiskBlockManager: Created local directory at /var/data/spark-f0634fda-1366-4da1-8ac2-262e4bf9952b/blockmgr-7ac2193b-f7ad-4bc2-bdfa-386d2d3f4bf6 23/11/28 01:20:04 INFO MemoryStore: MemoryStore started with capacity 413.9 MiB 23/11/28 01:20:04 INFO SparkEnv: Registering OutputCommitCoordinator 23/11/28 01:20:04 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/11/28 01:20:04 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-pi-96895e8c1382ff30-driver-svc.default.svc:4040 23/11/28 01:20:04 INFO SparkContext: Added JAR local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar at file:/opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar with timestamp 1701134403914 23/11/28 01:20:04 WARN SparkContext: The jar local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar has been added already. Overwriting of added jars is not supported in the current version. 23/11/28 01:20:04 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 23/11/28 01:20:24 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: External scheduler cannot be instantiated at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2961) at org.apache.spark.SparkContext.<init>(SparkContext.scala:557) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2672) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:945) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [spark-pi-96895e8c1382ff30-driver] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:186) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:84) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:75) at scala.Option.map(Option.scala:230) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:74) at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:123) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2955) ... 19 more Caused by: java.net.UnknownHostException: kubernetes.default.svc: Temporary failure in name resolution at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) at java.base/java.net.InetAddress.getAddressesFromNameService(Unknown Source) at java.base/java.net.InetAddress$NameServiceAddresses.get(Unknown Source) at java.base/java.net.InetAddress.getAllByName0(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at okhttp3.Dns$1.lookup(Dns.java:40) at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:185) at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:149) at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:84) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:215) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:135) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.OIDCTokenRefreshInterceptor.intercept(OIDCTokenRefreshInterceptor.java:41) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:151) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257) at okhttp3.RealCall.execute(RealCall.java:93) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:490) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:416) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:397) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:933) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:220) ... 26 more 23/11/28 01:20:24 INFO SparkUI: Stopped Spark web UI at http://spark-pi-96895e8c1382ff30-driver-svc.default.svc:4040 23/11/28 01:20:24 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/11/28 01:20:24 INFO MemoryStore: MemoryStore cleared 23/11/28 01:20:24 INFO BlockManager: BlockManager stopped 23/11/28 01:20:24 INFO BlockManagerMaster: BlockManagerMaster stopped 23/11/28 01:20:24 WARN MetricsSystem: Stopping a MetricsSystem that is not running 23/11/28 01:20:24 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/11/28 01:20:24 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2961) at org.apache.spark.SparkContext.<init>(SparkContext.scala:557) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2672) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:945) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [spark-pi-96895e8c1382ff30-driver] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:186) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:84) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:75) at scala.Option.map(Option.scala:230) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:74) at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:123) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2955) ... 19 more Caused by: java.net.UnknownHostException: kubernetes.default.svc: Temporary failure in name resolution at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) at java.base/java.net.InetAddress.getAddressesFromNameService(Unknown Source) at java.base/java.net.InetAddress$NameServiceAddresses.get(Unknown Source) at java.base/java.net.InetAddress.getAllByName0(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at okhttp3.Dns$1.lookup(Dns.java:40) at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:185) at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:149) at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:84) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:215) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:135) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.OIDCTokenRefreshInterceptor.intercept(OIDCTokenRefreshInterceptor.java:41) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:151) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257) at okhttp3.RealCall.execute(RealCall.java:93) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:490) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:416) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:397) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:933) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:220) ... 26 more 23/11/28 01:20:24 INFO ShutdownHookManager: Shutdown hook called 23/11/28 01:20:24 INFO ShutdownHookManager: Deleting directory /var/data/spark-f0634fda-1366-4da1-8ac2-262e4bf9952b/spark-0190347a-61ed-45b3-bddc-d0a92db7bcc8 23/11/28 01:20:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-7ca3d253-94b6-442c-b557-f4270c3d12ce
Similar issue: https://issues.apache.org/jira/browse/SPARK-29640