Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-10607

Combination of ReducesinkDedup + TopN optimization yields incorrect result if there are multiple GBY in reducer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.13.0, 0.14.0, 1.0.0, 1.1.0
    • 1.2.0
    • Logical Optimizer, Tez
    • None

    Description

      select ctinyint, count(cdouble) from (select ctinyint, cdouble from alltypesorc group by ctinyint, cdouble) t1 group by ctinyint order by ctinyint limit 20;
      

      This gives different result set depending on which set of optimizations are on. In particular in .q test environment following two invocations will give you different result set:

      *   mvn test -Phadoop-2 -Dtest.output.overwrite=true -Dtest=TestMiniTezCliDriver -Dqfile=test.q -Dhive.optimize.reducededuplication.min.reducer=1 -Dhive.limit.pushdown.memory.usage=0.3f
      
      *   mvn test -Phadoop-2 -Dtest.output.overwrite=true -Dtest=TestMiniTezCliDriver -Dqfile=test.q 
      

      Attachments

        1. HIVE-10607.patch
          12 kB
          Ashutosh Chauhan

        Issue Links

          Activity

            People

              ashutoshc Ashutosh Chauhan
              ashutoshc Ashutosh Chauhan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: