Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12101

Inconsistent speeds with result spooling

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • None
    • None
    • Backend, Clients
    • ghx-label-4

    Description

      Noticed a case when enabling result spooling makes query execution much slower:

      impala-shell -B -q "set spool_query_results=1; select cast(l_shipdate as timestamp) from tpch_parquet.lineitem;" > /dev/null
      Fetched 6001215 row(s) in 23.81s

      impala-shell -B -q "set spool_query_results=0; select cast(l_shipdate as timestamp) from tpch_parquet.lineitem;" > /dev/null
      Fetched 6001215 row(s) in 9.92s

      Using beeswax leads to completely different results:

      impala-shell --protocol=beeswax -B -q "set spool_query_results=1; select cast(l_shipdate as timestamp) from tpch_parquet.lineitem;" > /dev/null
      Fetched 6001215 row(s) in 10.32s

      impala-shell --protocol=beeswax -B -q "set spool_query_results=0; select cast(l_shipdate as timestamp) from tpch_parquet.lineitem;" > /dev/null
      Fetched 6001215 row(s) in 11.87s

      This anomaly seems to occur when both the client and the coordinator needs significant time to process the returned rows.

      Note that the slow result generation from timestamps (and dates) is a known performance issue in the coordinator - most time is spent in converting dates/timestamps to strings. On the other side I don't understand how enabling result spooling can slow down a query.

      Attachments

        Activity

          People

            Unassigned Unassigned
            csringhofer Csaba Ringhofer
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: