Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23265

Duplicate rowsets are returned with Limit and Offset set

    XMLWordPrintableJSON

Details

    Description

      We have a query which produces duplicate results even when there is no duplicate records in underlying tables.

      Sample Query

      select * from orderdatatest_ext order by col1 limit 1000,50
      

      The problem appears when order by clause is used with col1 having non-unique rows. Apparently the duplicates are being produced during reducer phase of the query.
      set hive.vectorized.execution.reduce.enabled=false does not cause the problem.

      Data in table is as follows.

      1,1
      1,2
      1,3
      .
      .
      1,1500
      

      Results with hive.vectorized.execution.reduce.enabled=true

      +-------------------------+-------------------------+
      | orderdatatest_ext.col1  | orderdatatest_ext.col2  |
      +-------------------------+-------------------------+
      | 1                       | 1001                    |
      | 1                       | 1002                    |
      | 1                       | 1003                    |
      | 1                       | 1004                    |
      | 1                       | 1005                    |
      | 1                       | 1006                    |
      | 1                       | 1007                    |
      | 1                       | 1008                    |
      | 1                       | 1009                    |
      | 1                       | 1010                    |
      | 1                       | 1011                    |
      | 1                       | 1012                    |
      | 1                       | 1013                    |
      | 1                       | 1014                    |
      | 1                       | 1015                    |
      | 1                       | 1016                    |
      | 1                       | 1017                    |
      | 1                       | 1018                    |
      | 1                       | 1019                    |
      | 1                       | 1020                    |
      | 1                       | 1021                    |
      | 1                       | 1022                    |
      | 1                       | 1023                    |
      | 1                       | 1024                    |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      | 1                       | 1                       |
      +-------------------------+-------------------------+
      

      Results with hive.vectorized.execution.reduce.enabled=false

      +-------------------------+-------------------------+
      | orderdatatest_ext.col1  | orderdatatest_ext.col2  |
      +-------------------------+-------------------------+
      | 1                       | 1001                    |
      | 1                       | 1002                    |
      | 1                       | 1003                    |
      | 1                       | 1004                    |
      | 1                       | 1005                    |
      | 1                       | 1006                    |
      | 1                       | 1007                    |
      | 1                       | 1008                    |
      | 1                       | 1009                    |
      | 1                       | 1010                    |
      | 1                       | 1011                    |
      | 1                       | 1012                    |
      | 1                       | 1013                    |
      | 1                       | 1014                    |
      | 1                       | 1015                    |
      | 1                       | 1016                    |
      | 1                       | 1017                    |
      | 1                       | 1018                    |
      | 1                       | 1019                    |
      | 1                       | 1020                    |
      | 1                       | 1021                    |
      | 1                       | 1022                    |
      | 1                       | 1023                    |
      | 1                       | 1024                    |
      | 1                       | 1025                    |
      | 1                       | 1026                    |
      | 1                       | 1027                    |
      | 1                       | 1028                    |
      | 1                       | 1029                    |
      | 1                       | 1030                    |
      | 1                       | 1031                    |
      | 1                       | 1032                    |
      | 1                       | 1033                    |
      | 1                       | 1034                    |
      | 1                       | 1035                    |
      | 1                       | 1036                    |
      | 1                       | 1037                    |
      | 1                       | 1038                    |
      | 1                       | 1039                    |
      | 1                       | 1040                    |
      | 1                       | 1041                    |
      | 1                       | 1042                    |
      | 1                       | 1043                    |
      | 1                       | 1044                    |
      | 1                       | 1045                    |
      | 1                       | 1046                    |
      | 1                       | 1047                    |
      | 1                       | 1048                    |
      | 1                       | 1049                    |
      | 1                       | 1050                    |
      +-------------------------+-------------------------+
      

      Table DDL

      CREATE EXTERNAL TABLE  orderdatatest_ext (col1 int, col2 int) stored as orc
      

      Attached sample ORC file.

      Problem appears to be with VectorLimitOperator.

      2020-04-20 15:35:49,693 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[6]
      2020-04-20 15:35:49,747 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_INTERMEDIATE_Map_1:0, RECORDS_OUT_OPERATOR_SEL_6:1500,
      2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[8]
      2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_OPERATOR_SEL_8:1050, RECORDS_OUT_INTERMEDIATE_Reducer_2:0,
      
      
      2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorLimitOperator|: Initializing operator LIM[9]
      2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorLimitOperator|: RECORDS_OUT_INTERMEDIATE_Reducer_2:0, RECORDS_OUT_OPERATOR_LIM_9:1050,
      

      Attachments

        1. HIVE-23265.2.patch
          4 kB
          Attila Magyar
        2. HIVE-23265.1.patch
          3 kB
          Attila Magyar
        3. 000000_0
          0.3 kB
          Chiran Ravani

        Activity

          People

            amagyar Attila Magyar
            chiran54321 Chiran Ravani
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: