Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5016

Missed opportunities for static partition pruning with COALESCE()

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, Impala 2.8.0
    • Impala 2.10.0
    • Frontend

    Description

      TBH I'm not sure how general this issue is, so rather than attempting to express it as abstractly as possible, I'll just start with a specific example (apologies in advance for formatting...horrible JIRA artist here):

      # create two tables -- the main point here is that t1 has 2 partitions
      
      > create table t1 (i int) partitioned by (part int) ;
      > create table t2 (i int) partitioned by (part int) ;
      > insert into t1 (part, i) values (1, 1), (2, 2) ;
      > insert into t2 (part, i) values (1, 1), (2, 2) ;
      
      # query 1: coalesce t1 partition column with literal value — partition pruning kicks in
      
      > explain with results as ( select coalesce(t1.i, t2.i) as i, coalesce(t1.part, 666) as part from t1 full outer join t2 on (t1.i = t2.i) ) select * from results where part = 1;
      Query: explain with results as ( select coalesce(t1.i, t2.i) as i, coalesce(t1.part, 666) as part from t1 full outer join t2 on (t1.i = t2.i) ) select * from results where part = 1
      +------------------------------------------------------------------------------------+
      | Explain String                                                                     |
      +------------------------------------------------------------------------------------+
      <SNIP>
      | 00:SCAN HDFS [default.t1]                                                          |
      |    partitions=1/2 files=1 size=2B                                                  |
      +------------------------------------------------------------------------------------+
      
      # ^^^note # partitions scanned^^^
      
      # query 2: coalesce t1 partition column against dynamic value — partition pruning does NOT kick in
      
      > explain with results as ( select coalesce(t1.i, t2.i) as i, coalesce(t1.part, t2.part) as part from t1 full outer join t2 on (t1.i = t2.i) ) select * from results where part = 1;
      Query: explain with results as ( select coalesce(t1.i, t2.i) as i, coalesce(t1.part, t2.part) as part from t1 full outer join t2 on (t1.i = t2.i) ) select * from results where part = 1
      +------------------------------------------------------------------------------------+
      | Explain String                                                                     |
      +------------------------------------------------------------------------------------+
      <SNIP>
      | 00:SCAN HDFS [default.t1]                                                          |
      |    partitions=2/2 files=2 size=4B                                                  |
      +------------------------------------------------------------------------------------+
      
      # ^^^note # partitions scanned^^^
      

      In both of those queries, we're applying a filter predicate to a column that is defined as follows:

      coalesce(t1.part, _)

      Since t1.part is a partition column in t1, every row in t1 has a non-null value for t1.part. Furthermore, because t1.part appears as the first parameter to coalesce, every row in the result set that contains any data from t1 will get its "part" value from t1. Thus, all t1 data in the result set will be subject to the filter predicate in t1.part – i.e. all t1 partitions other than (part=1) could have been pruned during query planning.

      Question: what is wrong with the above reasoning? am I missing something fundamental/obvious here?

      Thanks!

      Attachments

        Issue Links

          Activity

            People

              feng_xiao_yu fengYu
              wdf William Forson
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: