Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16791

Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • Hive, HiveServer2
    • None

    Description

      SMB Join gives incorrect results.

      SMB-Join
      set hive.execution.engine=tez;
      set hive.enforce.sortmergebucketmapjoin=false;
      set hive.optimize.bucketmapjoin=true;
      set hive.optimize.bucketmapjoin.sortedmerge=true;
      set hive.auto.convert.sortmerge.join=true;
      set hive.auto.convert.join=true;
      set hive.auto.convert.join.noconditionaltask.size=500000;
      
      OK
      2016	1	11999639
      2016	2	18955110
      2017	2	22217437
      Time taken: 92.647 seconds, Fetched: 3 row(s)
      
      MAP-JOIN
      set hive.execution.engine=tez;
      set hive.enforce.sortmergebucketmapjoin=false;
      set hive.optimize.bucketmapjoin=true;
      set hive.optimize.bucketmapjoin.sortedmerge=true;
      set hive.auto.convert.sortmerge.join=true;
      set hive.auto.convert.join=true;
      set hive.auto.convert.join.noconditionaltask.size=50000000;
      OK
      2016	1	26586093
      2016	2	17724062
      2017	2	8862031
      Time taken: 17.49 seconds, Fetched: 3 row(s)
      
      Shuffle Join
      set hive.execution.engine=tez;
      set hive.enforce.sortmergebucketmapjoin=false;
      set hive.optimize.bucketmapjoin=true;
      set hive.optimize.bucketmapjoin.sortedmerge=true;
      set hive.auto.convert.sortmerge.join=false;
      set hive.auto.convert.join=false;
      set hive.auto.convert.join.noconditionaltask.size=50000000;
      
      OK
      2016	1	26586093
      2016	2	17724062
      2017	2	8862031
      Time taken: 38.575 seconds, Fetched: 3 row(s)
      

      Attachments

        1. sample-data.tar.gz-aa
          55.00 MB
          Saumil Mayani
        2. sample-data.tar.gz-ab
          55.00 MB
          Saumil Mayani
        3. sample-data.tar.gz-ac
          55.00 MB
          Saumil Mayani
        4. sample-data.tar.gz-ad
          22.52 MB
          Saumil Mayani
        5. sample-data-query.txt
          5 kB
          Saumil Mayani

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            djaiswal Deepak Jaiswal Assign to me
            smayani Saumil Mayani
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment