Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25147

Limit offset query in CTAS will cause data loss

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0
    • None
    • Parser
    • None

    Description

      query like:

       

      create table ... as select ... from ... limit  offset 
      or:
      insert overwrite table ... select ... from ... limit offset 
      

      will cause data loss.

       

      reproduce step:

      create table test_limit_offset (id int);
      insert into test_limit_offset values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16);
      drop table if exists test_limit_offset2;
      create table test_limit_offset2 as select * from test_limit_offset limit 5 offset 2;
      

      query test_limit_offset2

      +------------------------+
      | test_limit_offset2.id  |
      +------------------------+
      | 5                      |
      | 6                      |
      | 7                      |
      +------------------------+
      
      

      expected 5 numbers while got 3;

      We can see the problem from the execution plan

      +----------------------------------------------------+
      |                      Explain                       |
      +----------------------------------------------------+
      | Plan optimized by CBO.                             |
      |                                                    |
      | Vertex dependency in root stage                    |
      | Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)            |
      |                                                    |
      | Stage-3                                            |
      |   Stats Work{}                                     |
      |     Stage-9                                        |
      |       Create Table Operator:                       |
      |         name:dgz.test_limit_offset2                |
      |         Stage-2                                    |
      |           Dependency Collection{}                  |
      |             Stage-5(CONDITIONAL)                   |
      |               Move Operator                        |
      |                 Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) |
      |                   Conditional Operator             |
      |                     Stage-1                        |
      |                       Reducer 2                    |
      |                       File Output Operator [FS_6]  |
      |                         table:{"name:":"dgz.test_limit_offset2"} |
      |                         Limit [LIM_5] (rows=5 width=1) |   //reduce side full limit offset
      |                           Number of rows:5,Offset of rows:2 |
      |                           Select Operator [SEL_4] (rows=5 width=1) |
      |                             Output:["_col0"]       |
      |                           <-Map 1 [CUSTOM_SIMPLE_EDGE] |
      |                             PARTITION_ONLY_SHUFFLE [RS_3] |
      |                               Limit [LIM_2] (rows=5 width=1) |  //map side full limit offset
      |                                 Number of rows:5,Offset of rows:2 |
      |                                 Select Operator [SEL_1] (rows=13 width=1) |
      |                                   Output:["_col0"] |
      |                                   TableScan [TS_0] (rows=13 width=1) |
      |                                     dgz@test_limit_offset,test_limit_offset,Tbl:COMPLETE,Col:NONE,Output:["id"] |
      |             Stage-4(CONDITIONAL)                   |
      |               File Merge                           |
      |                  Please refer to the previous Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) |
      |             Stage-7                                |
      |               Move Operator                        |
      |                 Stage-6(CONDITIONAL)               |
      |                   File Merge                       |
      |                      Please refer to the previous Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) |
      |         Stage-0                                    |
      |           Move Operator                            |
      |              Please refer to the previous Stage-5(CONDITIONAL) |
      |              Please refer to the previous Stage-4(CONDITIONAL) |
      |              Please refer to the previous Stage-7  |
      |                                                    |
      +----------------------------------------------------+
      
      

      It generate limit operator on both map and reduce side. 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            lalapala gaozhan ding
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: