Description
HIVE-15580 introduced a dummy iterator per input row which can be eliminated. This is because SparkReduceRecordHandler is able to handle single key value pairs. We can refactor this part of code 1. to remove the need for a iterator and 2. to optimize the code path for per (key, value) based (instead of (key, value iterator)) processing. It would be also great if we can measure the performance after the optimizations and compare to performance prior to HIVE-15580.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-15580 Eliminate unbounded memory usage for orderBy and groupBy in Hive on Spark
- Resolved
- relates to
-
HIVE-15683 Make what's done in HIVE-15580 for group by configurable
- Resolved