[FLINK-35165] AdaptiveBatch Scheduler should not restrict the default source parallelism to the max parallelism set - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Runtime / Coordination
Labels:
- pull-request-available

Description

Copy-pasting the reasoning mentioned on this discussion thread.

Let me state why I think "jobmanager.adaptive-batch-scheduler.default-source-parallelism" should not be bound by the "jobmanager.adaptive-batch-scheduler.max-parallelism".

Source vertex is unique and does not have any upstream vertices - Downstream vertices read shuffled data partitioned by key, which is not the case for the Source vertex
Limiting source parallelism by downstream vertices' max parallelism is incorrect
If we say for ""semantic consistency" the source vertex parallelism has to be bound by the overall job's max parallelism, it can lead to following issues:
- High filter selectivity with huge amounts of data to read
- Setting high "jobmanager.adaptive-batch-scheduler.max-parallelism" so that source parallelism can be set higher can lead to small blocks and sub-optimal performance.
- Setting high "jobmanager.adaptive-batch-scheduler.max-parallelism" requires careful tuning of network buffer configurations which is unnecessary in cases where it is not required just so that the source parallelism can be set high.

Attachments

Issue Links

links to

GitHub Pull Request #24736

Activity

People

Assignee:: Unassigned

Reporter:: Venkata krishnan Sowrirajan

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 18/Apr/24 19:15

Updated:: 4 days ago 03:31