Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21387

Wrong result for UNION query with GROUP BY consisting of PK columns

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 4.0.0
    • 4.0.0-alpha-1
    • Query Planning
    • None

    Description

      Reproducer

      create table t1(i int primary key disable rely, j int);
      insert into t1 values(1,100),(2,200);
      create table t2(i int primary key disable rely, j int);
      insert into t2 values(2,1000),(4,500);
      
      select i from (select i, j from t1 union all select i,j from t2) subq group by i,j;
      

      Expected Result

      2
      2
      4
      1
      

      Actual Result

      1
      2
      4
      

      CBO Plan

      HiveAggregate(group=[{0}])
        HiveProject(i=[$0], j=[$1])
          HiveUnion(all=[true])
            HiveProject(i=[$0], j=[$1])
              HiveTableScan(table=[[default, t1]], table:alias=[t1])
            HiveProject(i=[$0], j=[$1])
              HiveTableScan(table=[[default, t2]], table:alias=[t2])
      

      This is due to Group by reduction logic reducing keys when it shouldn't. Because of UNION relative cardinality of the group by keys are changed (they are not PK/UNIQUE anymore). Therefore we shouldn't be trying to reduce group by keys at all.

      Attachments

        Issue Links

          Activity

            People

              vgarg Vineet Garg
              vgarg Vineet Garg
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: