Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12790

ScanNode.getInputCardinality can overestimate if LIMIT is large.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • None
    • Impala 4.4.0
    • Frontend
    • None

    Description

      The bug is first found in https://gerrit.cloudera.org/c/20993/1/fe/src/main/java/org/apache/impala/planner/ScanNode.java#338

      Simple scan query can have ScanNode.getInputCardinality() return larger number than it should be if the query has LIMIT larger than table cardinality. This bug is visible in following test query with low EXEC_SINGLE_NODE_ROWS_THRESHOLD option set:

      Section DISTRIBUTEDPLAN of query at line 451:
      select * from functional_kudu.tinytable limit 1000;
      
      Actual does not match expected result:
      PLAN-ROOT SINK
      |
      01:EXCHANGE [UNPARTITIONED]
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      |  limit: 1000
      |
      00:SCAN KUDU [functional_kudu.tinytable]
         limit: 1000
         row-size=43B cardinality=3
      
      Expected:
      PLAN-ROOT SINK
      |
      00:SCAN KUDU [functional_kudu.tinytable]
         limit: 1000
         row-size=43B cardinality=3 

      The distributed plan should not have EXCHANGE added since it is a small query (cardinality=3) and can run in coordinator only.

      Attachments

        Activity

          People

            rizaon Riza Suminto
            rizaon Riza Suminto
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: