Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-23143

Transactions: PPD in Delete deltas is broken

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Transactions
    • None

    Description

      The optimization introduced in HIVE-16812 seems broken. PPD is not happening for delete deltas, and in fact, also causes wrong results if data column names conflict with ACID ROW__ID column names (bucket, originalTransactionId etc).

      This seems to be happening because after ORC-491, all PPD happens in data columns only for ACID orc files, so the filters for delete PPD never get applied on metadata columns and try to apply to data columns instead. And when the data columns have a column name (like "bucket" in the below example), it returns wrong results. 

      Steps to repro:

      set hive.fetch.task.conversion=none;
      set hive.query.results.cache.enabled=false;
      create table test(a int, bucket int) stored as orc tblproperties("transactional"="true");
      insert into table test values (1, 1111), (2, 2222), (3, 3333);
      delete from test where a = 2;
      select * from test; //Will return the deleted row as well
      set hive.txn.filter.delete.events=false;
      select * from test; //Correct results returned. Will not return the deleted row
      

      cc pvary gopalv

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              asomani Abhishek Somani
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: