Details
-
Sub-task
-
Status: Closed
-
Major
-
Resolution: Fixed
-
Impala 2.9.0
-
None
-
ghx-label-8
Description
IMPALA-5004 adds a new query level option called 'topn_bytes_limit' that we should document. The changes in IMPALA-5004 work by estimating the amount of memory required to run a TopN operator. The memory estimate is based on the size of the individual tuples that need to be processed by the TopN operator, as well as the sum of the limit and offset in the query. TopN operators don't spill to disk so they have to keep all rows they process in memory.
If the estimated size of the working set of the TopN operator exceeds the threshold of 'topn_bytes_limit' the TopN operator will be replaced with a Sort operator. The Sort operator can spill to disk, but it processes all the data (the limit and offset have no affect). So switching to Sort might incur performance penalties, but it will require less memory.