Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7350 More accurate memory estimates for admission
  3. IMPALA-7749

Merge aggregation node memory estimate is incorrectly influenced by limit

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0
    • Impala 3.1.0
    • Frontend
    • None
    • ghx-label-9

    Description

      In the below query the estimate for node ID 3 is too low. If you remove the limit it is correct.

      [localhost:21000] default> set explain_level=2; explain select l_orderkey, l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5;
      EXPLAIN_LEVEL set to 2
      Query: explain select l_orderkey, l_partkey, l_linenumber, count(*) from tpch.lineitem group by 1, 2, 3 limit 5
      +-------------------------------------------------------------------------------------------+
      | Explain String                                                                            |
      +-------------------------------------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=43.94MB Threads=4                               |
      | Per-Host Resource Estimates: Memory=450MB                                                 |
      |                                                                                           |
      | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                     |
      | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B thread-reservation=1            |
      | PLAN-ROOT SINK                                                                            |
      | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                                |
      | |                                                                                         |
      | 04:EXCHANGE [UNPARTITIONED]                                                               |
      | |  limit: 5                                                                               |
      | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                                |
      | |  tuple-ids=1 row-size=28B cardinality=5                                                 |
      | |  in pipelines: 03(GETNEXT)                                                              |
      | |                                                                                         |
      | F01:PLAN FRAGMENT [HASH(l_orderkey,l_partkey,l_linenumber)] hosts=3 instances=3           |
      | Per-Host Resources: mem-estimate=10.00MB mem-reservation=1.94MB thread-reservation=1      |
      | 03:AGGREGATE [FINALIZE]                                                                   |
      | |  output: count:merge(*)                                                                 |
      | |  group by: l_orderkey, l_partkey, l_linenumber                                          |
      | |  limit: 5                                                                               |
      | |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0  |
      | |  tuple-ids=1 row-size=28B cardinality=5                                                 |
      | |  in pipelines: 03(GETNEXT), 00(OPEN)                                                    |
      | |                                                                                         |
      | 02:EXCHANGE [HASH(l_orderkey,l_partkey,l_linenumber)]                                     |
      | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                                |
      | |  tuple-ids=1 row-size=28B cardinality=6001215                                           |
      | |  in pipelines: 00(GETNEXT)                                                              |
      | |                                                                                         |
      | F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                            |
      | Per-Host Resources: mem-estimate=440.27MB mem-reservation=42.00MB thread-reservation=2    |
      | 01:AGGREGATE [STREAMING]                                                                  |
      | |  output: count(*)                                                                       |
      | |  group by: l_orderkey, l_partkey, l_linenumber                                          |
      | |  mem-estimate=176.27MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0 |
      | |  tuple-ids=1 row-size=28B cardinality=6001215                                           |
      | |  in pipelines: 00(GETNEXT)                                                              |
      | |                                                                                         |
      | 00:SCAN HDFS [tpch.lineitem, RANDOM]                                                      |
      |    partitions=1/1 files=1 size=718.94MB                                                   |
      |    stored statistics:                                                                     |
      |      table: rows=6001215 size=718.94MB                                                    |
      |      columns: all                                                                         |
      |    extrapolated-rows=disabled max-scan-range-rows=1068457                                 |
      |    mem-estimate=264.00MB mem-reservation=8.00MB thread-reservation=1                      |
      |    tuple-ids=0 row-size=20B cardinality=6001215                                           |
      |    in pipelines: 00(GETNEXT)                                                              |
      +-------------------------------------------------------------------------------------------+
      

      The bug is that we use cardinality_ to cap the number of distinct values, but cardinality_ is capped at the output limit.

      Attachments

        Activity

          People

            poojanilangekar Pooja Nilangekar
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: