[IMPALA-12101] Inconsistent speeds with result spooling - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Invalid
Affects Version/s: None
Fix Version/s: None
Component/s: Backend, Clients
Labels:
- performance

Epic Color:
ghx-label-4

Description

Noticed a case when enabling result spooling makes query execution much slower:

impala-shell -B -q "set spool_query_results=1; select cast(l_shipdate as timestamp) from tpch_parquet.lineitem;" > /dev/null
Fetched 6001215 row(s) in 23.81s

impala-shell -B -q "set spool_query_results=0; select cast(l_shipdate as timestamp) from tpch_parquet.lineitem;" > /dev/null
Fetched 6001215 row(s) in 9.92s

Using beeswax leads to completely different results:

impala-shell --protocol=beeswax -B -q "set spool_query_results=1; select cast(l_shipdate as timestamp) from tpch_parquet.lineitem;" > /dev/null
Fetched 6001215 row(s) in 10.32s

impala-shell --protocol=beeswax -B -q "set spool_query_results=0; select cast(l_shipdate as timestamp) from tpch_parquet.lineitem;" > /dev/null
Fetched 6001215 row(s) in 11.87s

This anomaly seems to occur when both the client and the coordinator needs significant time to process the returned rows.

Note that the slow result generation from timestamps (and dates) is a known performance issue in the coordinator - most time is spent in converting dates/timestamps to strings. On the other side I don't understand how enabling result spooling can slow down a query.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Csaba Ringhofer

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Apr/23 11:28

Updated:: 27/Apr/23 17:26

Resolved:: 27/Apr/23 17:26