[HIVE-8054] Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch] - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Convert to Issue

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Spark
Labels:

Description

Option hive.optimize.union.remove introduced in ~~HIVE-3276~~ removes union operators from the operator graph in certain cases as an optimization reduce the number of MR jobs. While making sense in MR, this optimization is actually harmful to an execution engine such as Spark, which natives supports union without requiring additional jobs. This is because removing union operator creates disjointed operator graphs, each graph generating a job, and thus this optimization requires more jobs to run the query. Not to mention the additional complexity handling linked FS descriptors.

I propose that we disable such optimization when the execution engine is Spark.