Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17612

Hive does not insert dynamic partition-sets atomically

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.0, 3.0.0
    • None
    • CLI, Hive
    • None

    Description

      If one inserts partitions to a Hive table using a Hive query (e.g. INSERT OVERWRITE TABLE my_table PARTITION (foo, bar) SELECT * FROM another_table;), each dynamic partition is added separately, using HMSC.append_partition(). By contrast, Pig/HCatLoader does the same atomically, using HMSC.add_partitions().

      Because of this behaviour, Oozie workflows might kick off when the first partition is registered, but before the last partition in the set is available.

      This was verified in the metastore-logs, with multiple ADD_PARTITION events fired for the same query (i.e. once per added partition), instead of a single event for the set.

      It would be ideal for Hive to provide atomic partition-adds.

      Attachments

        Activity

          People

            mithun Mithun Radhakrishnan
            mithun Mithun Radhakrishnan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: