[HIVE-12391] SkewJoinOptimizer might not kick in if columns are renamed after TableScanOperator - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 1.3.0, 2.0.0
Fix Version/s: 1.3.0, 2.0.0
Component/s: Logical Optimizer
Labels:
None

Target Version/s:

1.3.0, 2.0.0

Description

SkewJoinOptimizer will not kick in if the columns are just renamed after the TS e.g. by the creation of a derived table.

To reproduce, consider the following example:

set hive.optimize.skewjoin.compiletime = true;

CREATE TABLE T1(key STRING, val STRING)
SKEWED BY (key) ON ((2)) STORED AS TEXTFILE;

CREATE TABLE T2(key STRING, val STRING)
SKEWED BY (key) ON ((3)) STORED AS TEXTFILE;

For this query, SkewJoinOptimizer kicks in:

SELECT a.*, b.*
FROM T1 a JOIN T2 b
ON a.key = b.key

For this one, it does not:

SELECT a.*, b.*
FROM 
  (SELECT key as k, val as v FROM T1) a
  JOIN
  (SELECT key as k, val as v FROM T2) b
ON a.k = b.k;

The reason is that SkewJoinOptimizer does not backtrack the origin of the column. Instead it just uses its name to know if it is produced by a certain TS.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-12391.patch
12/Nov/15 14:42
16 kB
jcamachorodriguez

Issue Links

blocks

HIVE-12017 Do not disable CBO by default when number of joins in a query is equal or less than 1

Closed

Activity

People

Assignee:: Jesús Camacho Rodríguez

Reporter:: Jesús Camacho Rodríguez

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 12/Nov/15 14:11

Updated:: 27/Feb/24 22:23

Resolved:: 13/Nov/15 09:23