[SPARK-32141] Repartition leads to out of memory - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 2.4.4
Fix Version/s: None
Component/s: EC2
Labels:
None

Description

We have an application that does aggregation on 7 columns. In order to avoid shuffles we thought of doing repartition on those 7 columns. It works well with 1 to 4tb of data. When it gets over 4Tb, it fails with OOM or disk space.

Do we have a better approach to reduce the shuffle ? For our biggest dataset, the spark job never ran with repartition. We are out of options.

We do have a 24 node cluster with r5.24X machines and 1TB of disk. Our shuffle partition is set to 6912.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Lekshmi Nair

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 01/Jul/20 00:29

Updated:: 01/Jul/20 00:29