Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.0
-
None
-
Beeline version 3.1.0.3.0.1.0-187 by Apache Hive
Description
count(distinct a) over (partiton by b) return wring result. For example (T is CTE here):
select p, day, ts , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lines , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as days FROM T
WINDOW specification doesn't affect on results: same wrong with and without window.
count(1) and count(distinct day) return the same result. Count distinct is wrong.
I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and count(distinct return correct result.
Following query return non-empty result:
select A.*, B.days, B. from ( select p, day, ts , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number , count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lines , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as days , size(collect_set(day) OVER (PARTITION BY phone)) as days2 , dense_rank() over (partition by phone order by day) + dense_rank() over (partition by phone order by day desc) - 1 as days3 FROM T ) as A join ( select p, day, ts , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lines , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as days FROM T ) as B on A.p=B.p and A.line_number=B.line_number where A.days!=B.days order by A.p, A.line_number