Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-16177

non Acid to acid conversion doesn't handle _copy_N files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 0.14.0
    • 2.4.0, 3.0.0
    • Transactions
    • None

    Description

      create table T(a int, b int) clustered by (a)  into 2 buckets stored as orc TBLPROPERTIES('transactional'='false')
      insert into T(a,b) values(1,2)
      insert into T(a,b) values(1,3)
      alter table T SET TBLPROPERTIES ('transactional'='true')
      

      //we should now have bucket files 000001_0 and 000001_0_copy_1

      but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can be copy_N files and numbers rows in each bucket from 0 thus generating duplicate IDs

      select ROW__ID, INPUT__FILE__NAME, a, b from T
      

      produces

      {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/000001_0,1,2
      {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/000001_0_copy_1,1,3
      

      owen.omalley, do you have any thoughts on a good way to handle this?

      attached patch has a few changes to make Acid even recognize copy_N but this is just a pre-requisite. The new UT demonstrates the issue.

      Futhermore,

      alter table T compact 'major'
      select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
      

      produces

      {"transactionid":0,"bucketid":1,"rowid":0}	file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands....warehouse/nonacidorctbl/base_-9223372036854775808/bucket_00001	1	2
      

      HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() demonstrating this

      This is because compactor doesn't handle copy_N files either (skips them)

      Attachments

        1. HIVE-16177.20-branch-2.patch
          97 kB
          Eugene Koifman
        2. HIVE-16177.19-branch-2.patch
          97 kB
          Eugene Koifman
        3. HIVE-16177.18-branch-2.patch
          95 kB
          Eugene Koifman
        4. HIVE-16177.18.patch
          96 kB
          Eugene Koifman
        5. HIVE-16177.17.patch
          96 kB
          Eugene Koifman
        6. HIVE-16177.16.patch
          100 kB
          Eugene Koifman
        7. HIVE-16177.15.patch
          100 kB
          Eugene Koifman
        8. HIVE-16177.14.patch
          74 kB
          Eugene Koifman
        9. HIVE-16177.11.patch
          96 kB
          Eugene Koifman
        10. HIVE-16177.10.patch
          96 kB
          Eugene Koifman
        11. HIVE-16177.09.patch
          97 kB
          Eugene Koifman
        12. HIVE-16177.08.patch
          85 kB
          Eugene Koifman
        13. HIVE-16177.07.patch
          79 kB
          Eugene Koifman
        14. HIVE-16177.04.patch
          43 kB
          Eugene Koifman
        15. HIVE-16177.02.patch
          9 kB
          Eugene Koifman
        16. HIVE-16177.01.patch
          8 kB
          Eugene Koifman

        Issue Links

          Activity

            People

              ekoifman Eugene Koifman
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: