Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-19570

Multiple inserts using "Group by" and "Distinct" generates incorrect results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 1.2.0, 3.0.0
    • None
    • None

    Description

      Repro steps:

      drop database if exists ax1 cascade;
      create database ax1;
      use ax1;
      
      CREATE TABLE 
      	tmp1 ( 
      		v1 string , v2 string , v3 string ) 
      ROW FORMAT DELIMITED 
      FIELDS TERMINATED BY '\t' 
      LINES TERMINATED BY '\n' 
      ;
      
      INSERT INTO tmp1
      VALUES 
      ('a', 'b', 'c1') 
      , ('a', 'b', 'c2') 
      , ('d', 'e', 'f') 
      , ('g', 'h', 'i') 
      ;
      
      CREATE TABLE 
      tmp_grouped_by_one_col  ( v1 string , cnt__v2 int , cnt__v3 int ) 
      ROW FORMAT DELIMITED 
      FIELDS TERMINATED BY '\t' 
      LINES TERMINATED BY '\n' 
      ;
      
      CREATE TABLE 
      tmp_grouped_by_two_col ( v1 string , v2 string , cnt__v3 int ) 
      ROW FORMAT DELIMITED 
      FIELDS TERMINATED BY '\t' 
      LINES TERMINATED BY '\n' 
      ;
      
      CREATE TABLE 
      tmp_grouped_by_all_col ( v1 string , v2 string , v3 string ) 
      ROW FORMAT DELIMITED 
      FIELDS TERMINATED BY '\t' 
      LINES TERMINATED BY '\n' 
      ;
      
      FROM tmp1
      INSERT INTO tmp_grouped_by_one_col 
      SELECT v1, count(distinct v2), count(distinct v3) 
      GROUP BY v1
      INSERT INTO tmp_grouped_by_all_col 
      SELECT v1, v2, v3
      GROUP BY v1, v2, v3
      ;
      
      select 'tmp_grouped_by_one_col',count(*) from tmp_grouped_by_one_col
      union all
      select 'tmp_grouped_by_two_col',count(*) from tmp_grouped_by_two_col
      union all
      select 'tmp_grouped_by_all_col',count(*) from tmp_grouped_by_all_col;
      
      select * from tmp_grouped_by_all_col;
      

      tmp_grouped_by_all_col table should have 4 reocrds but it loads 7 records into the table.

      +----------------------------+----------------------------+----------------------------+--+
      | tmp_grouped_by_all_col.v1  | tmp_grouped_by_all_col.v2  | tmp_grouped_by_all_col.v3  |
      +----------------------------+----------------------------+----------------------------+--+
      | a                          | b                          | b                          |
      | a                          | c1                         | c1                         |
      | a                          | c2                         | c2                         |
      | d                          | e                          | e                          |
      | d                          | f                          | f                          |
      | g                          | h                          | h                          |
      | g                          | i                          | i                          |
      +----------------------------+----------------------------+----------------------------+--+
      

      Attachments

        Issue Links

          Activity

            People

              rtrivedi12 Riju Trivedi
              rtrivedi12 Riju Trivedi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: