[FLINK-33768] FLIP-379: Support dynamic source parallelism inference for batch jobs - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Done
Affects Version/s: 1.19.0
Fix Version/s: 1.19.0
Component/s: Runtime / Coordination
Labels:
- pull-request-available

Release Note:

Hide
In Flink 1.19, we have supported dynamic source parallelism inference for batch jobs, which allows source connectors to dynamically infer the parallelism based on the actual amount of data to consume. This feature is a significant improvement over previous versions, which only assigned a fixed default parallelism to source vertices.

Source connectors need to implement the inference interface to enable dynamic parallelism inference. Currently, the FileSource connector has already been developed with this functionality in place.

Additionally, the configuration `execution.batch.adaptive.auto-parallelism.default-source-parallelism` will be used as the upper bound of source parallelism inference. And now it will not default to 1. Instead, if it is not set, the upper bound of allowed parallelism set via `execution.batch.adaptive.auto-parallelism.max-parallelism` will be used. If that configuration is also not set, the default parallelism set via `parallelism.default` or StreamExecutionEnvironment#setParallelism() will be used instead.

Show
In Flink 1.19, we have supported dynamic source parallelism inference for batch jobs, which allows source connectors to dynamically infer the parallelism based on the actual amount of data to consume. This feature is a significant improvement over previous versions, which only assigned a fixed default parallelism to source vertices. Source connectors need to implement the inference interface to enable dynamic parallelism inference. Currently, the FileSource connector has already been developed with this functionality in place. Additionally, the configuration `execution.batch.adaptive.auto-parallelism.default-source-parallelism` will be used as the upper bound of source parallelism inference. And now it will not default to 1. Instead, if it is not set, the upper bound of allowed parallelism set via `execution.batch.adaptive.auto-parallelism.max-parallelism` will be used. If that configuration is also not set, the default parallelism set via `parallelism.default` or StreamExecutionEnvironment#setParallelism() will be used instead.

Description

Currently, for JobVertices without parallelism configured, the AdaptiveBatchScheduler dynamically infers the vertex parallelism based on the volume of input data. Specifically, for Source vertices, it uses the value of `execution.batch.adaptive.auto-parallelism.default-source-parallelism` as the fixed parallelism. If this is not set by the user, the default value of 1 is used as the source parallelism, which is actually a temporary implementation solution.

We aim to support dynamic source parallelism inference for batch jobs. More details see FLIP-379.

Attachments

Issue Links

Add Link

is related to

FLINK-34356 Release Testing: Verify FLINK-33768 Support dynamic source parallelism inference for batch jobs

Closed

Delete this link

links to

GitHub Pull Request #24087

Delete this link

mentioned in: Page Loading...

Delete this link

Sub-Tasks

Create Sub-Task

1.	Update the documentation and configuration description about dynamic source parallelism inference	Closed	xingbe	Actions
2.	Modify the effective strategy of `execution.batch.adaptive.auto-parallelism.default-source-parallelism`	Closed	xingbe	Actions
3.	File source connector support dynamic source parallelism inference in batch jobs	Closed	xingbe	Actions

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: xingbe

Reporter:: xingbe

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Dec/23 01:58

Updated:: 07/Feb/24 10:02

Resolved:: 01/Feb/24 09:31

Agile

View on Board

FLIP-379: Support dynamic source parallelism inference for batch jobs

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Agile

Slack

Issue deployment