Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19653

Incorrect predicate pushdown for groupby with grouping sets

    XMLWordPrintableJSON

Details

    Description

      Consider the following query:

      CREATE TABLE T1(a STRING, b STRING, s BIGINT);
      INSERT OVERWRITE TABLE T1 VALUES ('aaaa', 'bbbb', 123456);
      
      SELECT * FROM (
      SELECT a, b, sum(s)
      FROM T1
      GROUP BY a, b GROUPING SETS ((), (a), (b), (a, b))
      ) t WHERE a IS NOT NULL;
      

      When hive.optimize.ppd is enabled (and hive.cbo.enable=false), the query will output:

      NULL	NULL	123456
      NULL	bbbb	123456
      aaaa	NULL	123456
      aaaa	bbbb	123456
      

      We can see the predicate "a IS NOT NULL" takes no effect, which is incorrect.

      When performing PPD optimization for a GBY operator, we should make sure all grouping sets contains the processing expr before pushdown. otherwise the expr value after GBY is changed and the result is wrong.

      Attachments

        1. HIVE-19653.1.patch
          6 kB
          Zhang Li
        2. HIVE-19653.patch
          5 kB
          Zhang Li

        Activity

          People

            dengzh Zhihua Deng
            richox Zhang Li
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h
                2h