Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292 Hive on Spark
  3. HIVE-8054

Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Option hive.optimize.union.remove introduced in HIVE-3276 removes union operators from the operator graph in certain cases as an optimization reduce the number of MR jobs. While making sense in MR, this optimization is actually harmful to an execution engine such as Spark, which natives supports union without requiring additional jobs. This is because removing union operator creates disjointed operator graphs, each graph generating a job, and thus this optimization requires more jobs to run the query. Not to mention the additional complexity handling linked FS descriptors.

      I propose that we disable such optimization when the execution engine is Spark.

      Attachments

        1. HIVE-8054-spark.patch
          136 kB
          Na Yang
        2. HIVE-8054.3-spark.patch
          145 kB
          Na Yang
        3. HIVE-8054.2-spark.patch
          141 kB
          Na Yang

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            nyang Na Yang Assign to me
            xuefuz Xuefu Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment