Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25071

Number of reducers limited to fixed 1 when updating/deleting

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • None

    Description

      When updating/deleting bucketed tables an extra ReduceSink operator is created to enforce bucketing. After HIVE-22538 number of reducers limited to fixed 1 in these RS operators.

      This can lead to performance degradation.

      Prior HIVE-22538 multiple reducers was available such cases. The reason for limiting the number of reducers is to ensure RowId ascending order in delete delta files produced by the update/delete statements.

      This is the plan of delete statement like:

      DELETE FROM t1 WHERE a = 1;
      
      TS[0]-FIL[8]-SEL[2]-RS[3]-SEL[4]-RS[5]-SEL[6]-FS[7]
      

      RowId order is ensured by RS[3] and bucketing is enforced by RS[5]: number of reducers were limited to bucket number in the table or hive.exec.reducers.max. However RS[5] does not provide any ordering so above plan may generate unsorted deleted deltas which leads to corrupted data reads.

      Prior HIVE-22538 these RS operators were merged by ReduceSinkDeduplication and the resulting RS kept the ordering and enabled multiple reducers. It could do because ReduceSinkDeduplication was prepared for ACID writes. This was removed by HIVE-22538 to get a more generic ReduceSinkDeduplication.

      Attachments

        Issue Links

          Activity

            People

              kkasa Krisztian Kasa
              kkasa Krisztian Kasa
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h