Details
Description
The ClientRequestState maintains an internal results cache (which is really just a QueryResultSet) in order to provide support for the TFetchOrientation.FETCH_FIRST fetch orientation (used by Hue - see https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083).
The cache itself has some limitations:
- It caches all results in a QueryResultSet with limited admission control integration
- It has a max size, if the size is exceeded the cache is emptied
- It cannot spill to disk
Result spooling could potentially replace the query result cache and provide a few benefits; it should be able to fit more rows since it can spill to disk. The memory is better tracked as well since it integrates with both admitted and reserved memory. Hue currently sets the max result set fetch size to https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61, would be good to check how well that value works for Hue users so we can decide if replacing the current result cache with result spooling makes sense.
This would require some changes to result spooling as well, currently it discards rows whenever it reads them from the underlying BufferedTupleStream. It would need the ability to reset the read cursor, which would require some changes to the PlanRootSink interface as well.
Attachments
Issue Links
- is related to
-
IMPALA-4281 Move query result caching into Coordinator
- Open
- relates to
-
IMPALA-8656 Support for eagerly fetching and spooling all query result rows
- Resolved