Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22413

Avoid dirty read when reading the ACID table while compaction is running

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Transactions
    • None

    Description

      There is a problem that dirty read occurs when reading the ACID table while base or delta directories are being created by the compactor. Especially it is highly likely to occur in the S3 storage because the “move” logic of S3 is “copy and delete”, and it takes a long time to copy if the size of files are large or bucketing count is large.

      So here’s the logic to avoid this problem. If “_tmp” prefixed directories are existed in the partition directory on the process of listing the child directories when reading the ACID table, compare the names of the directory in the “_tmp” one and skip it in case of the same. Then it will read the files before merging, no difference on the results.

      Attachments

        1. HIVE-22413.1.patch
          3 kB
          Hocheol Park

        Activity

          People

            Unassigned Unassigned
            hocha.park Hocheol Park
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: