Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26893

Extend batch partition APIs to ignore partition schemas

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Metastore

    Description

      There are several HMS APIs that return a list of partitions, e.g. get_partitions_ps(), get_partitions_by_names(), add_partitions_req() with needResult=true, etc. Each partition instance will have a unique list of FieldSchemas as the partition schema:

      org.apache.hadoop.hive.metastore.api.Partition
      -> org.apache.hadoop.hive.metastore.api.StorageDescriptor
         ->  cols: list<org.apache.hadoop.hive.metastore.api.FieldSchema> 

      This could occupy a large memory footprint for wide tables (e.g. with 2k cols). See the heap histogram in IMPALA-11812 as an example.

      Some engines like Impala doesn't actually use/respect the partition level schema. It's a waste of network/serde resource to transmit them. It'd be nice if these APIs provide an optional boolean flag for ignoring partition schemas. So HMS clients (e.g. Impala) don't need to clear them later (to save mem).

      Attachments

        Issue Links

          Activity

            People

              hemanth619 Sai Hemanth Gantasala
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: