Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
1.3.0, 2.0.0
-
None
Description
SkewJoinOptimizer will not kick in if the columns are just renamed after the TS e.g. by the creation of a derived table.
To reproduce, consider the following example:
set hive.optimize.skewjoin.compiletime = true;
CREATE TABLE T1(key STRING, val STRING)
SKEWED BY (key) ON ((2)) STORED AS TEXTFILE;
CREATE TABLE T2(key STRING, val STRING)
SKEWED BY (key) ON ((3)) STORED AS TEXTFILE;
For this query, SkewJoinOptimizer kicks in:
SELECT a.*, b.* FROM T1 a JOIN T2 b ON a.key = b.key
For this one, it does not:
SELECT a.*, b.* FROM (SELECT key as k, val as v FROM T1) a JOIN (SELECT key as k, val as v FROM T2) b ON a.k = b.k;
The reason is that SkewJoinOptimizer does not backtrack the origin of the column. Instead it just uses its name to know if it is produced by a certain TS.
Attachments
Attachments
Issue Links
- blocks
-
HIVE-12017 Do not disable CBO by default when number of joins in a query is equal or less than 1
- Closed