Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20699 Query based compactor for full CRUD Acid tables
  3. HIVE-21451

ACID: Avoid using hive.acid.key.index to determine if the file is original or not

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 3.1.1
    • None
    • Transactions
    • None

    Description

      The transactional files written in hive have each row decorated with ROW_ID column. However, when we bring in files using LOAD DATA... command to the transactional tables, they do not have these metadata columns (in Hive ACID parlance, these are called original files). These original files are decorated with an inferred ROWID generated while reading these. However, after these are compacted, the ROW_ID metadata column, becomes part of the file itself.

      To determine if a file is original or not, currently we use check for the presence of hive.acid.key.index. For query based compaction, currently we do not write hive.acid.key.index (HIVE-21165). This means, there is a possibility that that even after compaction, they get treated as original files.

      Irrespective of HIVE-21165, we should avoid hive.acid.key.index to decide whether the file is original or not, and instead look for the presence of ROW__ID to do that. hive.acid.key.index should be treated as a performance optimization, as it was seemingly meant to be.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              vgumashta Vaibhav Gumashta
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: