Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3450

ExecNodes with LIMITs have cardinality estimates that are too large

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.5.0
    • None
    • Frontend

    Description

      set NUM_NODES=0;
      select * from tpch.lineitem UNION ALL (select * from tpch.lineitem) LIMIT 1
      
      Summary
      Operator          #Hosts   Avg Time   Max Time  #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail        
      -----------------------------------------------------------------------------------------------------------
      00:UNION               1   85.646us   85.646us      1      12.00M  200.00 KB        -1.00 B                
      |--02:SCAN HDFS        1  177.568us  177.568us      0       6.00M          0        -1.00 B  tpch.lineitem 
      01:SCAN HDFS           1  141.077ms  141.077ms  1.02K       6.00M  202.96 MB        -1.00 B  tpch.lineitem
      

      Note 00:UNION estimates 12M rows of output even though the limit is pushed to its children.

      Interestingly, if you look at the distributed plan:

      Distributed Plan
      Operator          #Hosts   Avg Time   Max Time  #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail        
      -----------------------------------------------------------------------------------------------------------
      03:EXCHANGE            1   14.515us   14.515us      1           1          0        -1.00 B  UNPARTITIONED 
      00:UNION               3   92.674us  144.611us      3      12.00M  200.00 KB              0                
      |--02:SCAN HDFS        3  180.513us  221.991us      0       6.00M          0      264.00 MB  tpch.lineitem 
      01:SCAN HDFS           3  155.859ms  163.053ms  3.07K       6.00M   65.18 MB      264.00 MB  tpch.lineitem
      

      The xchg node correctly reflects the limit, but the UNION still doesn't have the right estimate.

      This affects runtime filters which use the single-node cardinalities to estimate the NDV for a particular filter.

      Attachments

        Activity

          People

            jbapple Jim Apple
            henryr Henry Robinson
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: