Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
4.0.0
-
None
Description
Reproducer
create table t1(i int primary key disable rely, j int); insert into t1 values(1,100),(2,200); create table t2(i int primary key disable rely, j int); insert into t2 values(2,1000),(4,500); select i from (select i, j from t1 union all select i,j from t2) subq group by i,j;
Expected Result
2 2 4 1
Actual Result
1 2 4
CBO Plan
HiveAggregate(group=[{0}]) HiveProject(i=[$0], j=[$1]) HiveUnion(all=[true]) HiveProject(i=[$0], j=[$1]) HiveTableScan(table=[[default, t1]], table:alias=[t1]) HiveProject(i=[$0], j=[$1]) HiveTableScan(table=[[default, t2]], table:alias=[t2])
This is due to Group by reduction logic reducing keys when it shouldn't. Because of UNION relative cardinality of the group by keys are changed (they are not PK/UNIQUE anymore). Therefore we shouldn't be trying to reduce group by keys at all.
Attachments
Issue Links
- Dependent
-
HIVE-21382 Group by keys reduction optimization - keys are not reduced in query23
- Closed