[SPARK-29905] ExecutorPodsLifecycleManager has sub-optimal behavior with dynamic allocation - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 3.1.0
Fix Version/s: 3.1.0
Component/s: Kubernetes, Spark Core
Labels:
None

Description

I've been playing with dynamic allocation on k8s and noticed some weird behavior from ExecutorPodsLifecycleManager when it's on.

The cause of this behavior is mostly because of the higher rate of pod updates when you have dynamic allocation. Pods being created and going away all the time generate lots of events, that are then translated into "snapshots" internally in Spark, and fed to subscribers such as ExecutorPodsLifecycleManager.

The first effect of that is that you get a lot of spurious logging. Since snapshots are incremental, you can get lots of snapshots with the same "PodDeleted" information, for example, and ExecutorPodsLifecycleManager will log for all of them. Yes, log messages are at debug level, but if you're debugging that stuff, it's really noisy and distracting.

The second effect is that the same way you get multiple log messages, you end up calling into the Spark scheduler, and worse, into the K8S API server, multiple times for the same pod update. We can optimize that and reduce the chattiness with the API server.

Attachments

Issue Links

links to

GitHub Pull Request #26535

Activity

People

Assignee:: Marcelo Masiero Vanzin

Reporter:: Marcelo Masiero Vanzin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Nov/19 01:41

Updated:: 26/Sep/20 23:25

Resolved:: 16/Apr/20 21:15