Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22215

Compaction of sorted table

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 3.1.0
    • Fix Version/s: None
    • Component/s: Hive
    • Labels:
      None

      Description

      I recently came across an issue regarding compacting tables with sorting.

      I am creating and populating with test data two tables: both ACID but only one is sorted

      USE priv;
      DROP TABLE IF EXISTS test_data;
      DROP TABLE IF EXISTS test_compact_insert_with_sorting;
      DROP TABLE IF EXISTS test_compact_insert_without_sorting;
      
      CREATE TABLE test_data AS SELECT 'foobar' col;
      CREATE TABLE test_compact_insert_with_sorting (col stringCLUSTERED BY (col) SORTED BY (col) INTO 1 BUCKETS
      TBLPROPERTIES ('transactional' = 'true''transactional_properties'='insert_only');
      
      CREATE TABLE test_compact_insert_without_sorting (col stringCLUSTERED BY (col) INTO 1 BUCKETS
      TBLPROPERTIES ('transactional' = 'true''transactional_properties'='insert_only');
      
      INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT col FROM test_data;
      INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM test_data;  INSERT OVERWRITE TABLE test_compact_insert_with_sorting SELECT col FROM test_data;
      INSERT OVERWRITE TABLE test_compact_insert_without_sorting SELECT col FROM test_data; 
      

      As expected, after these operations two base files were created for each table:

      $ hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact_insert*
      Found 2 items
      drwxrwx---+  - hive hadoop          0 2019-09-18 15:08 /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_0000001
      drwxrwx---+  - hive hadoop          0 2019-09-18 15:08 /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_0000002
      Found 2 items
      drwxrwx---+  - hive hadoop          0 2019-09-18 15:08 /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_0000001
      drwxrwx---+  - hive hadoop          0 2019-09-18 15:08 /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_0000002
      

      But after running manual compaction on those tables:

      USE priv;
      ALTER TABLE test_compact_insert_with_sorting COMPACT 'MAJOR';
      ALTER TABLE test_compact_insert_without_sorting COMPACT 'MAJOR';
      

      Tuns out only the one without sorting got compacted:

      hdfs dfs -ls /warehouse/tablespace/managed/hive/priv.db/test_compact*
      Found 2 items
      drwxrwx---+  - hive hadoop          0 2019-09-18 15:08 /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_0000001
      drwxrwx---+  - hive hadoop          0 2019-09-18 15:08 /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_with_sorting/base_0000002
      Found 1 items
      drwxrwx---+  - hive hadoop          0 2019-09-18 15:08 /warehouse/tablespace/managed/hive/priv.db/test_compact_insert_without_sorting/base_0000002
      

      Compactions inspection returns:

      $ beeline -e 'show compactions' | grep priv | grep test_compact
      | 7598474       | priv  | test_compact_insert_with_sorting           |  ---                                   | MAJOR  | succeeded  | master-01.pd.my-domain.com.pl-51  | 1568812155386  | 11         | None                    |
      | 7598475       | priv  | test_compact_insert_without_sorting        |  ---                                   | MAJOR  | succeeded  |  ---                             | 1568812155403  | 298        | None    
      

      Is this by design? Both compactions states are 'succeeded' but only the one that resulted in reducing number of base files took some time. Another remarkable behavior is compaction of the table with sorting has worker assigned meaning it is still in progress?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              pasza Pawel Jurkiewicz
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: