Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-14535

add insert-only ACID tables to Hive

    XMLWordPrintableJSON

Details

    Description

      Design doc:
      https://docs.google.com/document/d/1b3t1RywfyRb73-cdvkEzJUyOiekWwkMHdiQ-42zCllY

      Feel free to comment.

      Update: we ended up going with sequence number based implementation

      Update #2: this feature has been partially merged with ACID; the new table type is insert_only ACID, and the difference from the regular ACID is that it only supports inserts on one hand; and that it has no restrictions on file format, table type (bucketing), and much fewer restrictions on other operations (export/import, list bucketing, etc.)
      Currently some features that used to work when it was separated are not integrated properly; integration of these features is the remaining work in this JIRA

      Attachments

        Issue Links

          1.
          establish a separate path for FSOP to write into final path Sub-task Resolved Sergey Shelukhin
          2.
          pass information from FSOP/TezTask to commit to take care of speculative execution and failed tasks Sub-task Resolved Sergey Shelukhin
          3.
          edit or split MoveTask to commit job results to metastore Sub-task Resolved Sergey Shelukhin
          4.
          use metastore information on the read path appropriately Sub-task Resolved Sergey Shelukhin
          5.
          clean up file/txn information via a metastore thread Sub-task Resolved Sergey Shelukhin
          6.
          handle writing to dynamic partitions Sub-task Resolved Sergey Shelukhin
          7.
          handle unions Sub-task Resolved Sergey Shelukhin
          8.
          handle SKEWED BY for MM tables Sub-task Resolved Sergey Shelukhin
          9.
          handle hive.merge.*files in select queries Sub-task Resolved Sergey Shelukhin
          10.
          handle ctas for the MM tables Sub-task Resolved Sergey Shelukhin
          11.
          handle bucketing for MM tables Sub-task Resolved Sergey Shelukhin
          12.
          handle insert overwrite for MM tables Sub-task Resolved Sergey Shelukhin
          13.
          table conversion to and from MM Sub-task Resolved Sergey Shelukhin
          14.
          poison metastore APIs to make sure we can fail old clients for backward compat Sub-task Resolved Sergey Shelukhin
          15.
          handle ACID (or preclude ACID depending on the approach chosen) Sub-task Resolved Unassigned
          16.
          merge master into hive-14535 Sub-task Resolved Sergey Shelukhin
          17.
          support heartbeats for writeIds Sub-task Resolved Unassigned
          18.
          support in HBaseStore Sub-task Resolved Unassigned
          19.
          MM: support (or disable) alter table concatenate Sub-task Resolved Sergey Shelukhin
          20.
          integrate MM tables into ACID: add separate ACID type Sub-task Resolved Wei Zheng
          21.
          integrate MM tables into ACID: allow insert overwrite and don't require buckets, ORC etc. for the new type; don't run compaction Sub-task Resolved Wei Zheng
          22.
          integrate MM tables into ACID: replace MM metastore calls and structures with ACID ones Sub-task Resolved Wei Zheng
          23.
          integrate MM tables into ACID: merge cleaner into ACID threads Sub-task Resolved Wei Zheng
          24.
          run all tests for MM tables and fix the issues that are found Sub-task Resolved Wei Zheng
          25.
          don't use globStatus on S3 in MM tables Sub-task Resolved Sergey Shelukhin
          26.
          put FSOP manifests for the instances of the same vertex into a directory Sub-task Resolved Sergey Shelukhin
          27.
          handle load for MM tables Sub-task Resolved Sergey Shelukhin
          28.
          handle import for MM tables Sub-task Resolved Sergey Shelukhin
          29.
          handle truncate for MM tables (not atomic yet) Sub-task Resolved Sergey Shelukhin
          30.
          handle (or add a test for) multi-insert into MM tables Sub-task Resolved Sergey Shelukhin
          31.
          make sure export takes MM information into account Sub-task Resolved Sergey Shelukhin
          32.
          fix explain for MM tables - don't output for non-MM tables Sub-task Resolved Sergey Shelukhin
          33.
          integrate MM tables into ACID: replace "hivecommit" property with ACID property Sub-task Resolved Wei Zheng
          34.
          merge branch into master Sub-task Closed Sergey Shelukhin
          35.
          consider optimizing Utilities::handleMmTableFinalPath Sub-task Open Unassigned
          36.
          Driver::acquireWriteIds can be expensive trying to get details from MS Sub-task Resolved Wei Zheng
          37.
          MM tables - autoColumnStats_9.q different stats Sub-task Resolved Sergey Shelukhin
          38.
          MM tables - parquet_join test fails Sub-task Resolved Sergey Shelukhin
          39.
          MM tables - many queries duplicate the data after master merge Sub-task Resolved Sergey Shelukhin
          40.
          rename the new SQL files (if any remain) before merge to master Sub-task Resolved Unassigned
          41.
          MM tables: encrypted/(minimr?) CLI driver + fetch optimizer => no results Sub-task Resolved Sergey Shelukhin
          42.
          MM tables: fix (or disable) inferring buckets Sub-task Resolved Sergey Shelukhin
          43.
          MM tables: skewjoin test fails Sub-task Resolved Sergey Shelukhin
          44.
          MM tables: add exchange partition test after ACID integration Sub-task Resolved Wei Zheng
          45.
          instead of explicitly specifying mmWriteId during compilation phase, it should only be generated whenever needed during runtime Sub-task Resolved Wei Zheng
          46.
          Generate and use universal mmId instead of per db/table Sub-task Resolved Wei Zheng
          47.
          MM tables: most of the the parquet tests fail (w/o MM enabled) Sub-task Resolved Sergey Shelukhin
          48.
          MM tables: mm_conversions test is broken Sub-task Resolved Sergey Shelukhin
          49.
          MM tables: suspicious ORC HDFS counter changes Sub-task Resolved Sergey Shelukhin
          50.
          Fix some regression caused by HIVE-14879 Sub-task Resolved Wei Zheng
          51.
          Fix an export/import bug due to ACID integration Sub-task Resolved Wei Zheng
          52.
          Restore CTAS tests in mm_all.q Sub-task Resolved Wei Zheng
          53.
          Add MM test for temporary table Sub-task Resolved Wei Zheng
          54.
          Converting table to insert-only acid may open a txn in an inappropriate place Sub-task Resolved Eugene Koifman
          55.
          rely on AcidUtils.getAcidState() for read path Sub-task Resolved Wei Zheng
          56.
          MM tables patch conflicts with HIVE-17482 (Spark/Acid integration) Sub-task Closed Jason Dere
          57.
          MetaStoreUtils.isToInsertOnlyTable(Map<String, String> props) is not needed Sub-task Resolved Unassigned
          58.
          DDLTask.generateAddMmTasks(Table tbl) and other random code should not start transactions Sub-task Closed Sergey Shelukhin
          59.
          DDLTask.handleRemoveMm() assumes locks not present Sub-task Resolved Unassigned
          60.
          export/import for MM tables is broken Sub-task Closed Sergey Shelukhin
          61.
          Bucketed/Sorted tables - SMB join Sub-task Open Unassigned
          62.
          Compaction for MM runs Cleaner - needs test once IOW is supported Sub-task Open Unassigned
          63.
          DBTxnManager.acquireLocks() - MM tables should use shared lock for Insert Sub-task Closed Sergey Shelukhin
          64.
          TableScanDesc.isAcidTable is restricted to FullAcid tables Sub-task Closed Eugene Koifman
          65.
          JavaUtils.extractTxnId() etc Sub-task Patch Available Sergey Shelukhin
          66.
          grep TODO HIVE-15212.17.patch |wc - l = 49 Sub-task Resolved Sergey Shelukhin
          67.
          AcidUtils.parseString(String propertiesStr) - missing "break" Sub-task Resolved Sergey Shelukhin
          68.
          BucketingSortingOpProcFactory.FileSinkInferrer - throw/return? Sub-task Resolved Sergey Shelukhin
          69.
          CompactorMR.run() should update compaction_queue table for MM Sub-task Open Unassigned
          70.
          Miscellaneous List Sub-task Open Unassigned
          71.
          remove the logic to convert from MM to plain hive table Sub-task Resolved Sergey Shelukhin
          72.
          collapse union all produced directories into delta directory name suffix for MM Sub-task Open Unassigned
          73.
          FileSinkDesk.getMergeInputDirName() uses stmtId=0 Sub-task Closed Sergey Shelukhin
          74.
          ReplCopyTask doesn't support multi-file CopyWork Sub-task Closed Sergey Shelukhin
          75.
          add a flag to automatically create most tables as MM Sub-task Closed Sergey Shelukhin
          76.
          conversion to MM tables via alter may be broken Sub-task Resolved Steve Yeom
          77.
          MM tables - IOW is not ACID compliant Sub-task Closed Steve Yeom
          78.
          MM - some union cases are broken Sub-task Closed Sergey Shelukhin
          79.
          MM tables - Tez merge may not run Sub-task Resolved Unassigned
          80.
          verify CTAS logic for MM tables Sub-task Resolved Unassigned
          81.
          MM tables - multi-IOW is broken Sub-task Resolved Steve Yeom
          82.
          MM LOAD DATA with OVERWRITE doesn't use base_n directory concept Sub-task Closed Sergey Shelukhin
          83.
          MM/ACID tables: make tests that will never be compatible with acid use non-txn tables explicitly Sub-task Patch Available Sergey Shelukhin
          84.
          IOW + DP is broken for insert-only ACID Sub-task Closed Sergey Shelukhin

          Activity

            People

              sershe Sergey Shelukhin
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: