Description
We currently iterate on each delta file several times, one for deletes and then one for each one of the columns.
It seems that, when selecting all the columns it would be more efficient to apply the deltas to all columns at the same time. This might or might not be advantageous depending on the number of columns projected. Todd also suggest that whether this is an advantage also depends on whether there are predicates being pushed down.
We could likely also merge the updates and deletes into a single iteration or at least avoid applying the mutations if the row will end up delete (right now we still apply the updates even when we find that the row will be deleted).
Attachments
Issue Links
- relates to
-
KUDU-749 Improve performance for zipfian update
- Open