Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9671

Improve SINGULAR ROW SRC Node Explain Output

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • ghx-label-4

    Description

      For queries that involve more than one level of unnesting with complex/nested types the explain output can be tricky to read and reason about. The SUBPLAN node produces a tree shape that's not quite the same as other node types. In particular it can be tricky to understand what a SINGULAR ROW SRC node is acting on or producing.

      Currently the explain output for a SINGULAR ROW SRC doesn't provide any reference on what it's doing. It may not be a guarantee but leaf nodes in an Impala plan tree are usually annotated with the input source they are working on in square brackets "[]", for example SCAN and UNNEST nodes, but SINGULAR ROW SRC provides no such annotation. It would be great to fix this so that in explain strings.

      SINGULAR ROW SRC 

      becomes

      SINGULAR ROW SRC [input]

      Take the query below (SET EXPLAIN_LEVEL=3):

       

      Query: explain select c_custkey, o_orderkey, l_partkey from customer c, c.c_orders o, o.o_lineitems as li
      +----------------------------------------------------------------------------------------+
      | Explain String                                                                         |
      +----------------------------------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=16.00MB Threads=3                            |
      | Per-Host Resource Estimates: Memory=274MB                                              |
      | Analyzed query: SELECT c_custkey, o_orderkey, l_partkey FROM                           |
      | tpch_nested_parquet.customer c, c.c_orders o, o.o_lineitems li                         |
      |                                                                                        |
      | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                  |
      | |  Per-Host Resources: mem-estimate=10.06MB mem-reservation=0B thread-reservation=1    |
      | PLAN-ROOT SINK                                                                         |
      | |  output exprs: c_custkey, o_orderkey, l_partkey                                      |
      | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                             |
      | |                                                                                      |
      | 09:EXCHANGE [UNPARTITIONED]                                                            |
      | |  mem-estimate=10.06MB mem-reservation=0B thread-reservation=0                        |
      | |  tuple-ids=2,1,0 row-size=48B cardinality=15.00M                                     |
      | |  in pipelines: 00(GETNEXT)                                                           |
      | |                                                                                      |
      | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                         |
      | Per-Host Resources: mem-estimate=264.00MB mem-reservation=16.00MB thread-reservation=2 |
      | 01:SUBPLAN                                                                             |
      | |  mem-estimate=0B mem-reservation=0B thread-reservation=0                             |
      | |  tuple-ids=2,1,0 row-size=48B cardinality=15.00M                                     |
      | |  in pipelines: 00(GETNEXT)                                                           |
      | |                                                                                      |
      | |--08:NESTED LOOP JOIN [CROSS JOIN]                                                    |
      | |  |  mem-estimate=20B mem-reservation=0B thread-reservation=0                         |
      | |  |  tuple-ids=2,1,0 row-size=48B cardinality=100                                     |
      | |  |  in pipelines: 00(GETNEXT)                                                        |
      | |  |                                                                                   |
      | |  |--02:SINGULAR ROW SRC                                                              |
      | |  |     parent-subplan=01                                                             |
      | |  |     mem-estimate=0B mem-reservation=0B thread-reservation=0                       |
      | |  |     tuple-ids=0 row-size=20B cardinality=1                                        |
      | |  |     in pipelines: 00(GETNEXT)                                                     |
      | |  |                                                                                   |
      | |  04:SUBPLAN                                                                          |
      | |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0                          |
      | |  |  tuple-ids=2,1 row-size=28B cardinality=100                                       |
      | |  |  in pipelines: 00(GETNEXT)                                                        |
      | |  |                                                                                   |
      | |  |--07:NESTED LOOP JOIN [CROSS JOIN]                                                 |
      | |  |  |  mem-estimate=20B mem-reservation=0B thread-reservation=0                      |
      | |  |  |  tuple-ids=2,1 row-size=28B cardinality=10                                     |
      | |  |  |  in pipelines: 00(GETNEXT)                                                     |
      | |  |  |                                                                                |
      | |  |  |--05:SINGULAR ROW SRC                                                           |
      | |  |  |     parent-subplan=04                                                          |
      | |  |  |     mem-estimate=0B mem-reservation=0B thread-reservation=0                    |
      | |  |  |     tuple-ids=1 row-size=20B cardinality=1                                     |
      | |  |  |     in pipelines: 00(GETNEXT)                                                  |
      | |  |  |                                                                                |
      | |  |  06:UNNEST [o.o_lineitems li]                                                     |
      | |  |     parent-subplan=04                                                             |
      | |  |     mem-estimate=0B mem-reservation=0B thread-reservation=0                       |
      | |  |     tuple-ids=2 row-size=0B cardinality=10                                        |
      | |  |     in pipelines: 00(GETNEXT)                                                     |
      | |  |                                                                                   |
      | |  03:UNNEST [c.c_orders o]                                                            |
      | |     parent-subplan=01                                                                |
      | |     mem-estimate=0B mem-reservation=0B thread-reservation=0                          |
      | |     tuple-ids=1 row-size=0B cardinality=10                                           |
      | |     in pipelines: 00(GETNEXT)                                                        |
      | |                                                                                      |
      | 00:SCAN HDFS [tpch_nested_parquet.customer c, RANDOM]                                  |
      |    HDFS partitions=1/1 files=4 size=289.13MB                                           |
      |    predicates: !empty(c.c_orders)                                                      |
      |    predicates on o: !empty(o.o_lineitems)                                              |
      |    stored statistics:                                                                  |
      |      table: rows=150.00K size=289.13MB                                                 |
      |      columns missing stats: c_orders                                                   |
      |    extrapolated-rows=disabled max-scan-range-rows=50.11K                               |
      |    mem-estimate=264.00MB mem-reservation=16.00MB thread-reservation=1                  |
      |    tuple-ids=0 row-size=20B cardinality=150.00K                                        |
      |    in pipelines: 00(GETNEXT)                                                           |
      +----------------------------------------------------------------------------------------+
      

       

      It's easy to figure out what node 05 is doing but kind of tricky to understand what 02 is doing.

      One option would be for 02 to have the following annotation or something else more informative:

       

      SINGULAR ROW SRC [c.c_orders o, o.o_lineitems li]

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            superdupershant Shant Hovsepian

            Dates

              Created:
              Updated:

              Slack

                Issue deployment