Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20699 Query based compactor for full CRUD Acid tables
  3. HIVE-22474

Query based major compaction always creates only one bucket file

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • Hive
    • None

    Description

      set hive.execution.engine=mr;
      drop table if exists tbl2;
      create table tbl2 (a int, b int) clustered by (a) into 2 buckets stored as ORC TBLPROPERTIES('bucketing_version'='2', 'transactional'='true', 'compactorthreshold.hive.compactor.delta.num.threshold'='3');
      insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
      insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
      delete from tbl2 where b = 2;
      insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);
      delete from tbl2 where a = 1;
      

      Having the above use case, at the end of the major compaction the base directory contains only one bucket file, although the table is bucketed in 2 buckets. Before running the compaction, the delta directories contains the right amount of bucket files, and the data is split accordingly. 

       

      Attachments

        Issue Links

          Activity

            People

              lpinter László Pintér
              lpinter László Pintér
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: