[SPARK-29799] Split a kafka partition into multiple KafkaRDD partitions in the kafka external plugin for Spark Streaming - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 3.1.0
Fix Version/s: None
Component/s: Structured Streaming
Labels:
None

Description

When we use Spark Streaming to consume records from kafka, the generated KafkaRDD‘s partition number is equal to kafka topic's partition number, so we can not use more cpu cores to execute the streaming task except we change the topic's partition number，but we can not increase the topic's partition number infinitely.

Now I think we can split a kafka partition into multiple KafkaRDD partitions, and we can config

it, then we can use more cpu cores to execute the streaming task.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

0001-add-implementation-for-issue-SPARK-29799.patch
10/Nov/19 03:31
3 kB
zengrui

Issue Links

duplicates

SPARK-23541 Allow Kafka source to read data with greater parallelism than the number of topic-partitions

Resolved

links to

GitHub Pull Request #28182

Activity

People

Assignee:: Unassigned

Reporter:: zengrui

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 08/Nov/19 03:17

Updated:: 13/Apr/20 03:27

Resolved:: 13/Apr/20 03:27