Description
We use groupByKey for aggregations (or if hive.spark.use.groupby.shuffle is false we use repartitionAndSortWithinPartitions).
groupByKey has its drawbacks because it can't spill records within a single key group. It also seems to be doing some unnecessary work in Spark's Aggregator (not positive about this part).
repartitionAndSortWithinPartitions is better, but the sorting within partitions isn't necessary for aggregations.