Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
0.14.0
-
None
-
None
Description
"min (j) over (order by i rows between 1 preceding and 1 preceding)"
is logically equal to
"lag (j) over (order by i)"
"min (j) over (order by i rows between 1 following and 1 following)"
is logically equal to
"lead (j) over (order by i)"
Has can be seen in the following examples lag/lead works fine but the logically equivalent syntax that uses preceding/following returns wrong results.
$ cat>t.txt
23,29
84,15
58,19
81,17
48,15
36,49
91,26
89,22
63,57
33,10
$ hdfs dfs -mkdir /user/dmarkovitz/t
$ hdfs dfs -put t.txt /user/dmarkovitz/t
$ hive
hive> create external table t (i int,j int) row format delimited fields terminated by ',' location '/user/dmarkovitz/t';
hive> select * from t;
23 29
84 15
58 19
81 17
48 15
36 49
91 26
89 22
63 57
33 10
hive> select i,j,lag (j) over (order by i) as lag,min (j) over (order by i rows between 1 preceding and 1 preceding) as pseudo_lag from t;
23 29 NULL 10
33 10 29 10
36 49 10 10
48 15 49 15
58 19 15 15
63 57 19 17
81 17 57 15
84 15 17 15
89 22 15 15
91 26 22 22
hive> select i,j,lead (j) over (order by i) as lead,min (j) over (order by i rows between 1 following and 1 following) as pseudo_lead from t order by i;
23 29 10 10
33 10 49 10
36 49 15 10
48 15 19 15
58 19 57 15
63 57 17 17
81 17 15 15
84 15 22 15
89 22 26 15
91 26 NULL 22