Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
Impala 2.7.0, Impala 2.8.0
-
ghx-label-8
Description
Codegen gets rather expensive when scanning wide Avro tables (>500 columns), regardless of how many columns are materialized by the query.
select count(int_col16) from functional_avro.widetable_250_cols; +------------------+ | count(int_col16) | +------------------+ | 10 | +------------------+ Fetched 1 row(s) in 0.93s select count(int_col16) from functional_avro.widetable_500_cols; +------------------+ | count(int_col16) | +------------------+ | 10 | +------------------+ Fetched 1 row(s) in 2.87s select count(int_col16) from widetable_1000_cols; +------------------+ | count(int_col16) | +------------------+ | 10 | +------------------+ Fetched 1 row(s) in 10.58s
For the last query with 1000 columns, here's the codegen snippet from the query profile:
CodeGen:(Total: 10s115ms, non-child: 10s115ms, % non-child: 100.00%) - CodegenTime: 530.211us - CompileTime: 1s683ms - LoadTime: 0.000ns - ModuleBitcodeSize: 1.98 MB (2073044) - NumFunctions: 32 (32) - NumInstructions: 8.41K (8413) - OptimizationTime: 8s416ms - PeakMemoryUsage: 4.11 MB (4307456) - PrepareTime: 15.357ms