Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
4.0.0-beta-1
-
- Docker version : 19.03.6
- Hive version : 4.0.0-beta-1
- Driver version : Hive JDBC (4.0.0-beta-1)
- Beeline version : 4.0.0-beta-1
Description
Summary
The incremental rebuild plan and execution output are incorrect when one side of the table join has inserted/deleted join keys that the other side has deleted/inserted (note the order).
The argument is that tuples that have never been present simultaneously should not interact with one another, i.e., one's inserts should not join the other's deletes.
Related Test Case
The bug was discovered during replication of the test case:
hive/ql/src/test/queries/clientpositive/materialized_view_create_rewrite_5.q
Steps to Reproduce the Issue
- Configurations:
SET hive.vectorized.execution.enabled=false; set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; set hive.strict.checks.cartesian.product=false; set hive.materializedview.rewriting=true;
create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true');
insert into cmv_basetable_n6 values (1, 'alfred', 10.30, 2), (1, 'charlie', 20.30, 2);
create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d int) stored as orc TBLPROPERTIES ('transactional'='true');
insert into cmv_basetable_2_n3 values (1, 'bob', 30.30, 2), (1, 'bonnie', 40.30, 2);
CREATE MATERIALIZED VIEW cmv_mat_view_n6 TBLPROPERTIES ('transactional'='true') AS SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0;
show tables;
- Select tuples, including deletion and with VirtualColumn's, from the MV and source tables. We see that the MV is correctly built upon creation:
SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_mat_view_n6('acid.fetch.deleted.rows'='true');
SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_n6('acid.fetch.deleted.rows'='true');
SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_2_n3('acid.fetch.deleted.rows'='true');
- Now make an insert to the LHS and a delete to the RHS source table:
insert into cmv_basetable_n6 values (1, 'kevin', 50.30, 2); DELETE FROM cmv_basetable_2_n3 WHERE b = 'bonnie';
- Select again to see what happened:
SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_n6('acid.fetch.deleted.rows'='true');
SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_basetable_2_n3('acid.fetch.deleted.rows'='true');
- Use EXPLAIN CBO to produce the incremental rebuild plan for the MV, which is incorrect already:
EXPLAIN CBO ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
- Rebuild MV and see (incorrect) results:
ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD; SELECT ROW__IS__DELETED, ROW__ID, * FROM cmv_mat_view_n6('acid.fetch.deleted.rows'='true');
- Run MV definition directly, which outputs incorrect results because the MV is enabled for MV-based query rewrite, i.e., the following query will output what's in the MV for the time being:
SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0;
- Disable MV-based query rewrite for the MV and re-run the definition, which should give the correct results:
ALTER MATERIALIZED VIEW cmv_mat_view_n6 DISABLE REWRITE; SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = cmv_basetable_2_n3.a) WHERE cmv_basetable_2_n3.c > 10.0;
Note
This issue is also seen in update-incurred inserts/deletes.
Attachments
Attachments
Issue Links
- is blocked by
-
HIVE-28050 Disable Incremental non aggregated materialized view rebuild in presence of delete operations
- Closed
- relates to
-
HIVE-27945 Add 4.0 known issues into TSB
- Closed
- links to