Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14535 add insert-only ACID tables to Hive
  3. HIVE-16017

MM tables - many queries duplicate the data after master merge

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • hive-14535
    • None
    • None

    Description

      Update: happens on many more queries it looks like, and started happening after a recent master merge after I wasn't working on the feature for a while

      This duplicates the data (given that the original query is a self-union, essentially outputs it 4 times instead of 2) for either MM or non-MM tables, on MM branch.

      It seems to be adding correct inputs (esp. in non-MM case the inputs are the same as before). Presumably something in the output changes in the branch is broken for this case. Not sure what yet.

      CREATE TABLE tbl1_mm(key int, value string) CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS;
      insert overwrite table tbl1_mm select * from src where key < 10;
      
      select key, value from tbl1_mm a where key < 6
      union all
      select key, value from tbl1_mm a where key < 6;
      

      Attachments

        Activity

          People

            sershe Sergey Shelukhin
            sershe Sergey Shelukhin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: