Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23703

Major QB compaction with multiple FileSinkOperators results in data loss and one original file

    XMLWordPrintableJSON

Details

    Description

      Problems

      Example:

      drop table if exists tbl2;
      create transactional table tbl2 (a int, b int) clustered by (a) into 4 buckets stored as ORC TBLPROPERTIES('transactional'='true','transactional_properties'='default');
      insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4);
      insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4);
      insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);

      E.g. in the example above, bucketId=0 when a=2 and a=6.

      1. Data lossĀ 
      In non-acid tables, an operator's temp files are named with their task id. Because of this snippet, temp files in the FileSinkOperator for compaction tables are identified by their bucket_id.

      if (conf.isCompactionTable()) {
       fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + String.format(AcidUtils.BUCKET_DIGITS, bucketId),
       isNativeTable(), isSkewedStoredAsSubDirectories);
       } else {
       fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), isSkewedStoredAsSubDirectories);
       }
      

      So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and not 000000_0 and 000000_1 as they would normally.
      In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already there with a=6 data, because it too is named bucket_0. You can see in the logs:

       WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target path file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_0000002_v0000013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_00000 with a size 610 exists. Trying to delete it.
      

      2. Results in one original file
      OrcFileMergeOperator merges the results of the FSOp into 1 file named 000000_0.

      Fix

      1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0

      2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named 000000_0, will merge all files named bucket_0 into one file named bucket_0, and so on.

      3. MoveTask will get rid of the taskId directories if present and only move the bucket files in them, in case OrcMergeFileOp is not run.

      Attachments

        Issue Links

          Activity

            People

              klcopp Karen Coppage
              klcopp Karen Coppage
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m