[SPARK-35793] Repartition before writing data source tables - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

This umbrella ticket aim to track repartition before writing data source tables. It contains:

repartition by dynamic partition column before writing dynamic partition tables.
repartition before writing normal tables to avoid generating too many small files.
Improve local shuffle reader.

1.	Repartition by dynamic partition columns before insert table	In Progress	Unassigned	Actions
2.	Support repartition expand partitions in AQE	Resolved	XiDuo You	Actions
3.	Coalesce small output files through AQE	Resolved	Yuming Wang	Actions
4.	Improve CoalesceShufflePartitions to avoid generating small files	In Progress	Unassigned	Actions
5.	A not very elegant way to control ouput small file	In Progress	Unassigned	Actions
6.	Add a new operator to distingush if AQE can optimize safely	Resolved	XiDuo You	Actions
7.	Collapse above RebalancePartitions	Closed	Yuming Wang	Actions
8.	Only use local shuffle reader when REBALANCE_PARTITIONS_BY_NONE without CustomShuffleReaderExec	Resolved	XiDuo You	Actions
9.	Reduce the output partition of output stage to avoid producing small files.	In Progress	Unassigned	Actions
10.	Support specify initial partition number for rebalance	Resolved	XiDuo You	Actions