[SPARK-47836] Performance problem with QuantileSummaries - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.5.1
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

~~SPARK-29336~~ caused a severe performance regression.
In practice a partial_aggregate with several approx_percentile calls ran less than hour and the final aggrergation after exchange would have taken over a week.
Simple percentile ran about the same time in the first part and the final aggregate ran very quickly.

I made a benchmark, and it reveals that the merge operation is very-very slow: https://github.com/tanelk/spark/commit/3b16f429a77b10003572295f42361fbfb2f3c63e
From my experiments it looks like it is n^2 with the number of partitions (number of partial aggregations to merge).
When I reverted the changes made in this PR, then the "Only insert" and "Insert & merge" were very similar.

The cause seems to be, that compressImmut does not reduce the number samples allmost at all after merges and just keeps iterating over an evergrowing list.
I was not able to figure out how to fix the issue without just reverting the PR.

I also added a benchmark with KllDoublesSketch from the apache datasketches project and it worked even better than this class before this PR.
Only downside was that it is not-deterministic.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Tanel Kiis

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 12/Apr/24 13:20

Updated:: 12/Apr/24 13:24