Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9949

Subqueries in select can result in rows not being returned

    XMLWordPrintableJSON

Details

    Description

      IMPALA-8954 added support for uncorrelated subqueries but some do not return correct results. Both of those queries should return rows with NULLs where the subquery returned 0 rows.

      [localhost.EXAMPLE.COM:21000] default> select (select min(int_col) from functional.alltypes having min(int_col) < 0) from functional.alltypestiny;
      Fetched 0 row(s) in 0.16s
      
      [localhost.EXAMPLE.COM:21000] default> select (select min(int_col) from functional.alltypes limit 0) from functional.alltypestiny;
      Fetched 0 row(s) in 0.14s
      

      The problem is that the CROSS JOIN will return 0 rows if the subquery returns 0 rows.

      [localhost.EXAMPLE.COM:21000] default> explain select (select min(int_col) from functional.alltypes having min(int_col) < 0) from functional.alltypestiny;
      Query: explain select (select min(int_col) from functional.alltypes having min(int_col) < 0) from functional.alltypestiny
      +-------------------------------------------------------------+
      | Explain String                                              |
      +-------------------------------------------------------------+
      | Max Per-Host Resource Reservation: Memory=40.00KB Threads=5 |
      | Per-Host Resource Estimates: Memory=180MB                   |
      | Codegen disabled by planner                                 |
      |                                                             |
      | PLAN-ROOT SINK                                              |
      | |                                                           |
      | 03:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]                 |
      | |  row-size=4B cardinality=8                                |
      | |                                                           |
      | |--06:EXCHANGE [UNPARTITIONED]                              |
      | |  |                                                        |
      | |  00:SCAN HDFS [functional.alltypestiny]                   |
      | |     HDFS partitions=4/4 files=4 size=460B                 |
      | |     row-size=0B cardinality=8                             |
      | |                                                           |
      | 05:AGGREGATE [FINALIZE]                                     |
      | |  output: min:merge(int_col)                               |
      | |  having: min(int_col) < 0                                 |
      | |  row-size=4B cardinality=1                                |
      | |                                                           |
      | 04:EXCHANGE [UNPARTITIONED]                                 |
      | |                                                           |
      | 02:AGGREGATE                                                |
      | |  output: min(int_col)                                     |
      | |  row-size=4B cardinality=1                                |
      | |                                                           |
      | 01:SCAN HDFS [functional.alltypes]                          |
      |    HDFS partitions=24/24 files=24 size=478.45KB             |
      |    row-size=4B cardinality=7.30K                            |
      +-------------------------------------------------------------+
      Fetched 29 row(s) in 0.04s
      

      We need to detect cases where the subquery can return 0 rows and instead insert a left outer join.

      I did this in a patch and it fixed the issue.

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: