Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
Impala 2.12.0, Impala 3.2.0
-
ghx-label-8
Description
skyyws <skyyws@163.com> reported a bug in the mailing list on the following data set:
table A +------+ | a_id | +------+ | 1 | | 2 | +------+ table B +------+--------+ | b_id | amount | +------+--------+ | 1 | 10 | | 1 | 20 | | 2 | NULL | +------+--------+ table C +------+------+ | a_id | b_id | +------+------+ | 1 | 1 | | 2 | 2 | +------+------+
The following query returns a wrong result "1":
select count(1) from (
select t2.a_id,t2.amount1,t2.amount2
from( select a_id from a) t1
left outer join (
select c.a_id,sum(amount) as amount1,sum(amount) as amount2
from b join c on b.b_id = c.b_id group by c.a_id) t2
on t1.a_id = t2.a_id
) t;
Removing "t2.amount2" can get the right result "2":
select count(1) from (
select t2.a_id,t2.amount1
from( select a_id from a) t1
left outer join (
select c.a_id,sum(amount) as amount1,sum(amount) as amount2
from b join c on b.b_id = c.b_id group by c.a_id) t2
on t1.a_id = t2.a_id
) t;
The problem is that in query 1, Impala generates a wrong predicate "sum(amount) = sum(amount)" which rejects nulls:
+-------------------------------------------------------------+ | Explain String | +-------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=10.84MB Threads=8 | | Per-Host Resource Estimates: Memory=143MB | | Codegen disabled by planner | | | | PLAN-ROOT SINK | | | | | 12:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | row-size=8B cardinality=1 | | | | | 11:EXCHANGE [UNPARTITIONED] | | | | | 06:AGGREGATE | | | output: count(*) | | | row-size=8B cardinality=1 | | | | | 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] | | | hash predicates: c.a_id = a_id | | | other predicates: sum(amount) = sum(amount) <---------- Wrong inferred predicate which incorrectly reject nulls | | runtime filters: RF000 <- a_id | | | row-size=16B cardinality=2 | | | | | |--10:EXCHANGE [HASH(a_id)] | | | | | | | 00:SCAN HDFS [default.a] | | | partitions=1/1 files=1 size=4B | | | row-size=4B cardinality=2 | | | | | 09:AGGREGATE [FINALIZE] | | | output: sum:merge(amount) | | | group by: c.a_id | | | row-size=12B cardinality=2 | | | | | 08:EXCHANGE [HASH(c.a_id)] | | | | | 04:AGGREGATE [STREAMING] | | | output: sum(amount) | | | group by: c.a_id | | | row-size=12B cardinality=2 | | | | | 03:HASH JOIN [INNER JOIN, BROADCAST] | | | hash predicates: b.b_id = c.b_id | | | runtime filters: RF002 <- c.b_id | | | row-size=16B cardinality=3 | | | | | |--07:EXCHANGE [BROADCAST] | | | | | | | 02:SCAN HDFS [default.c] | | | partitions=1/1 files=1 size=8B | | | runtime filters: RF000 -> default.c.a_id | | | row-size=8B cardinality=2 | | | | | 01:SCAN HDFS [default.b] | | partitions=1/1 files=1 size=15B | | runtime filters: RF002 -> b.b_id | | row-size=8B cardinality=3 | +-------------------------------------------------------------+
Attachments
Issue Links
- causes
-
IMPALA-9358 Query slowdown with inline views and hundreds of columns
- Resolved
- is duplicated by
-
IMPALA-8276 Self equal to self predicate "x = x" generated by Impala caused incorrect query result
- Resolved
- is related to
-
IMPALA-7957 UNION ALL query returns incorrect results
- Resolved
-
IMPALA-8276 Self equal to self predicate "x = x" generated by Impala caused incorrect query result
- Resolved
-
IMPALA-10182 Rows with NULLs filtered out with duplicate columns in subquery select inside UNION ALL
- Resolved
- relates to
-
IMPALA-2643 Nested inline view produces incorrect result when referencing the same column implicitly
- Resolved