Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26283

Need better decision making for creating SortedDynPartitionOptimizer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Logical Optimizer
    • None

    Description

      When the hive.optimize.sort.dynamic.partition.threshold param is set to 0, the optimizer decides whether to create the SortedDynPartitionOptimizer class. 

      In production, we've seen this making the wrong decision when there is a simple INSERT..SELECT into a partitioned table and the data being inserted is skewed towards one partition. 

      In this case, it still is creating the SortedDynPartitionOptimizer.  This forces a reducer step and all the data gets sent to the same reducer.

      In order to reproduce this, you may also have to turn off "autogather" stats since this also will create a reducer step.

      What we ultimately want is just a mapper step so the load is evenly distributed across the mappers.

      Attachments

        Activity

          People

            Unassigned Unassigned
            scarlin Steve Carlin
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: