[HIVE-22808] HiveRelFieldTrimmer does not handle HiveTableFunctionScan - ASF JIRA

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 4.0.0-alpha-1
Component/s: Query Planning
Labels:
None

Description

Repro

CREATE TABLE table_16 (
timestamp_col_19    timestamp,
timestamp_col_29    timestamp,
int_col_27          int,
int_col_39          int,
boolean_col_18      boolean,
varchar0045_col_23  varchar(45)
);


CREATE TABLE table_7 (
int_col_10      int,
bigint_col_3    bigint
);

CREATE TABLE table_10 (
boolean_col_8       boolean,
boolean_col_16      boolean,
timestamp_col_5     timestamp,
timestamp_col_15    timestamp,
timestamp_col_30    timestamp,
decimal3825_col_26  decimal(38, 25),
smallint_col_9      smallint,
int_col_18          int
);

explain cbo 
SELECT
    DISTINCT COALESCE(a4.timestamp_col_15, IF(a4.boolean_col_16, a4.timestamp_col_30, a4.timestamp_col_5)) AS timestamp_col
FROM table_7 a3
RIGHT JOIN table_10 a4 
WHERE (a3.bigint_col_3) >= (a4.int_col_18)
INTERSECT ALL
SELECT
    COALESCE(LEAST(
        COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)),
        COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP))
        ),
        GREATEST(COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS TIMESTAMP)),
        COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP)))
    ) AS timestamp_col
FROM table_16 a1
    GROUP BY COALESCE(LEAST(
        COALESCE(a1.timestamp_col_19, CAST('2010-03-29 00:00:00' AS TIMESTAMP)),
        COALESCE(a1.timestamp_col_29, CAST('2014-08-16 00:00:00' AS TIMESTAMP))
    ),
    GREATEST(
        COALESCE(a1.timestamp_col_19, CAST('2013-07-01 00:00:00' AS TIMESTAMP)),
        COALESCE(a1.timestamp_col_29, CAST('2028-06-18 00:00:00' AS TIMESTAMP)))
    );

CBO Plan contains unnecessary columns or all columns from a table in projections like:

                          HiveProject(int_col_10=[$0], bigint_col_3=[$1], BLOCK__OFFSET__INSIDE__FILE=[$2], INPUT__FILE__NAME=[$3], CAST=[CAST($4):RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid)])

Cause
The plan contains a HiveTableFunctionScan operator:

HiveTableFunctionScan(invocation=[replicate_rows($0, $1)], rowType=[RecordType(BIGINT $f0, TIMESTAMP(9) $f1)])

HiveTableFunctionScan is not handled by HiveRelFieldTrimmer nor RelFieldTrimmer which suppose to remove unused columns in the CalcitePlanner.applyPreJoinOrderingTransforms(...) phase. The whole subtree rooted from HiveTableFunctionScan is ignored.

Whole plan:

CBO PLAN:
HiveProject($f0=[$1])
  HiveTableFunctionScan(invocation=[replicate_rows($0, $1)], rowType=[RecordType(BIGINT $f0, TIMESTAMP(9) $f1)])
    HiveProject($f0=[$2], $f1=[$0])
      HiveFilter(condition=[=($1, 2)])
        HiveAggregate(group=[{0}], agg#0=[count($1)], agg#1=[min($1)])
          HiveProject($f0=[$0], $f1=[$1])
            HiveUnion(all=[true])
              HiveProject($f0=[$0], $f1=[$1])
                HiveAggregate(group=[{0}], agg#0=[count()])
                  HiveProject($f0=[$0])
                    HiveAggregate(group=[{0}])
                      HiveProject($f0=[CASE(IS NOT NULL($7), $7, if($5, $8, $6))])
                        HiveJoin(condition=[>=($1, $13)], joinType=[inner], algorithm=[none], cost=[not available])
                          HiveProject(int_col_10=[$0], bigint_col_3=[$1], BLOCK__OFFSET__INSIDE__FILE=[$2], INPUT__FILE__NAME=[$3], CAST=[CAST($4):RecordType(BIGINT writeid, INTEGER bucketid, BIGINT rowid)])
                            HiveFilter(condition=[IS NOT NULL($1)])
                              HiveTableScan(table=[[default, table_7]], table:alias=[a3])
                          HiveProject(boolean_col_16=[$0], timestamp_col_5=[$1], timestamp_col_15=[$2], timestamp_col_30=[$3], int_col_18=[$4], BLOCK__OFFSET__INSIDE__FILE=[$5], INPUT__FILE__NAME=[$6], ROW__ID=[$7], CAST=[CAST($4):BIGINT])
                            HiveFilter(condition=[IS NOT NULL(CAST($4):BIGINT)])
                              HiveTableScan(table=[[default, table_10]], table:alias=[a4])
              HiveProject($f0=[$0], $f1=[$1])
                HiveAggregate(group=[{0}], agg#0=[count()])
                  HiveProject($f0=[$0])
                    HiveAggregate(group=[{0}])
                      HiveProject($f0=[CASE(IS NOT NULL(least(CASE(IS NOT NULL($0), $0, 2010-03-29 00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2014-08-16 00:00:00:TIMESTAMP(9)))), least(CASE(IS NOT NULL($0), $0, 2010-03-29 00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2014-08-16 00:00:00:TIMESTAMP(9))), greatest(CASE(IS NOT NULL($0), $0, 2013-07-01 00:00:00:TIMESTAMP(9)), CASE(IS NOT NULL($1), $1, 2028-06-18 00:00:00:TIMESTAMP(9))))])
                        HiveTableScan(table=[[default, table_16]], table:alias=[a1])

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

HIVE-22808.1.patch
31/Jan/20 14:47
5 kB
Krisztian Kasa
HIVE-22808.2.patch
04/Feb/20 07:17
9 kB
Krisztian Kasa
HIVE-22808.2.patch
04/Feb/20 05:26
9 kB
Krisztian Kasa
HIVE-22808.3.patch
04/Feb/20 09:29
10 kB
Krisztian Kasa
HIVE-22808.4.patch
05/Feb/20 05:33
11 kB
Krisztian Kasa
HIVE-22808.5.patch
05/Feb/20 12:53
10 kB
Krisztian Kasa

Issue Links

links to

Review Board

HiveRelFieldTrimmer does not handle HiveTableFunctionScan

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates