Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17466

Metastore API to list unique partition-key-value combinations

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.2.0, 3.0.0
    • 2.4.0, 3.0.0
    • Metastore
    • None

    Description

      Raising this on behalf of thiruvel, who wrote this initially as part of a tangential "data-discovery" system.

      Programs like Apache Oozie, Apache Falcon (or Yahoo GDM), etc. launch workflows based on the availability of table/partitions. Partitions are currently discovered by listing partitions using (what boils down to) HiveMetaStoreClient.listPartitions(). This can be slow and cumbersome, given that Partition objects are heavyweight and carry redundant information. The alternative is to use partition-names, which will need client-side parsing to extract part-key values.

      When checking which hourly partitions for a particular day have been published already, it would be preferable to have an API that pushed down part-key extraction into the RawStore layer, and returned key-values as the result. This would be similar to how SELECT DISTINCT part_key FROM my_table; would run, but at the HiveMetaStoreClient level.

      Here's what we've been using at Yahoo.

      Attachments

        1. HIVE-17466.1.patch
          1.27 MB
          Mithun Radhakrishnan
        2. HIVE-17466.2.patch
          1.27 MB
          Mithun Radhakrishnan
        3. HIVE-17466.2-branch-2.patch
          1.24 MB
          Mithun Radhakrishnan
        4. HIVE-17466.3.patch
          1.25 MB
          Mithun Radhakrishnan

        Issue Links

          Activity

            People

              thiruvel Thiruvel Thirumoolan
              mithun Mithun Radhakrishnan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: