Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3668

Incorrect results FIRST_VALUE function

    XMLWordPrintableJSON

Details

    Description

      Query returns incorrect results for first_value column on developers private branch.

      Query results from Drill (6 rows)

      0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query where firstVal_c2 = 'e' ORDER BY tile, c1;
      +----------+-------+----------+---------+-------+--------------+-------------+
      |    c1    |  c2   | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
      +----------+-------+----------+---------+-------+--------------+-------------+
      | -1       | e     | e        | null    | 1     | e            | e           |
      | 19       | null  | null     | null    | 1     | e            | null        |
      | 65536    | null  | null     | null    | 1     | e            | null        |
      | 15       | e     | null     | e       | 2     | e            | e           |
      | 1000000  | null  | null     | null    | 2     | e            | null        |
      | null     | null  | null     | null    | 3     | e            | null        |
      +----------+-------+----------+---------+-------+--------------+-------------+
      6 rows selected (0.269 seconds)
      
      

      Query results from Postgres for same input data (2 rows returned)

      postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 = 'e' ORDER BY tile, c1;
       c1 | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
      ----+----+---------+--------+------+-------------+------------
       -1 | e  | e       |        |    1 | e           | e
       15 | e  |         | e      |    2 | e           | e
      (2 rows)
      
      

      Another query that returns different results

      Results returned by Drill - zero rows returned

      0: jdbc:drill:schema=dfs.tmp> SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM `tblWnulls.parquet`) sub_query where firstVal_c2 is null ORDER BY tile, c1;
      +-----+-----+----------+---------+-------+--------------+-------------+
      | c1  | c2  | lead_c2  | lag_c2  | tile  | firstVal_c2  | lastVal_c2  |
      +-----+-----+----------+---------+-------+--------------+-------------+
      +-----+-----+----------+---------+-------+--------------+-------------+
      No rows selected (0.279 seconds)
      

      Results returned by Postgres for same input data, 4 rows returned

      postgres=# SELECT * FROM (SELECT c1, c2, lead(c2) OVER ( PARTITION BY c2 ORDER BY c1) lead_c2, lag(c2) OVER ( PARTITION BY c2 ORDER BY c1) lag_c2, ntile(3) OVER ( PARTITION BY c2 ORDER BY c1) tile, first_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) firstVal_c2, last_value(c2) OVER ( PARTITION BY c2 ORDER BY c1) lastVal_c2 FROM t222) sub_query where firstVal_c2 is null ORDER BY tile, c1;
         c1    | c2 | lead_c2 | lag_c2 | tile | firstval_c2 | lastval_c2 
      ---------+----+---------+--------+------+-------------+------------
            19 |    |         |        |    1 |             | 
         65536 |    |         |        |    1 |             | 
       1000000 |    |         |        |    2 |             | 
               |    |         |        |    3 |             | 
      (4 rows)
      

      Table definition on Postgres

      postgres=# \d t222
              Table "public.t222"
       Column |     Type     | Modifiers 
      --------+--------------+-----------
       c1     | integer      | 
       c2     | character(1) | 
      

      Data from the parquet file used in above queries on Drill

      0: jdbc:drill:schema=dfs.tmp> select * from `tblWnulls.parquet`;
      +-------------+-------+
      |     c1      |  c2   |
      +-------------+-------+
      | 1           | a     |
      | 2           | b     |
      | 13          | c     |
      | 4           | c     |
      | 5           | a     |
      | 6           | c     |
      | null        | d     |
      | 17          | b     |
      | 8           | c     |
      | 9           | b     |
      | 10          | d     |
      | 2147483647  | d     |
      | 10          | a     |
      | 11          | a     |
      | null        | c     |
      | 11          | d     |
      | 12          | c     |
      | 19          | null  |
      | 13          | b     |
      | 14          | a     |
      | 13          | c     |
      | 15          | e     |
      | -1          | e     |
      | 0           | a     |
      | 2147483647  | d     |
      | null        | d     |
      | 65536       | null  |
      | 1000000     | null  |
      | null        | null  |
      | 11111       | a     |
      +-------------+-------+
      30 rows selected (0.145 seconds)
      

      Attachments

        Issue Links

          Activity

            People

              adeneche Abdel Hakim Deneche
              khfaraaz Khurram Faraaz
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: