Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5243

Slow codegen for wide Avro tables

    XMLWordPrintableJSON

Details

    • ghx-label-8

    Description

      Codegen gets rather expensive when scanning wide Avro tables (>500 columns), regardless of how many columns are materialized by the query.

      select count(int_col16) from functional_avro.widetable_250_cols;
      +------------------+
      | count(int_col16) |
      +------------------+
      | 10               |
      +------------------+
      Fetched 1 row(s) in 0.93s
      
      select count(int_col16) from functional_avro.widetable_500_cols;
      +------------------+
      | count(int_col16) |
      +------------------+
      | 10               |
      +------------------+
      Fetched 1 row(s) in 2.87s
      
      select count(int_col16) from widetable_1000_cols;
      +------------------+
      | count(int_col16) |
      +------------------+
      | 10               |
      +------------------+
      Fetched 1 row(s) in 10.58s
      

      For the last query with 1000 columns, here's the codegen snippet from the query profile:

              CodeGen:(Total: 10s115ms, non-child: 10s115ms, % non-child: 100.00%)
                 - CodegenTime: 530.211us
                 - CompileTime: 1s683ms
                 - LoadTime: 0.000ns
                 - ModuleBitcodeSize: 1.98 MB (2073044)
                 - NumFunctions: 32 (32)
                 - NumInstructions: 8.41K (8413)
                 - OptimizationTime: 8s416ms
                 - PeakMemoryUsage: 4.11 MB (4307456)
                 - PrepareTime: 15.357ms
      

      Attachments

        1. screenshot-1.png
          9 kB
          Philip Martin

        Activity

          People

            philip Philip Martin
            alex.behm Alexander Behm
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: