Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-20317

Spark Dynamic Partition Pruning - Use Stats to Determine Partition Count

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.0, 4.0.0
    • None
    • Spark
    • None

    Description

      <!-- HMS -->
      <property>
      <name>hive.metastore.limit.partition.request</name>
      <value>2</value>
      </property>
      
      CREATE TABLE partitioned_user(
      	firstname VARCHAR(64),
      	lastname  VARCHAR(64)
      ) PARTITIONED BY (country VARCHAR(64))
      STORED AS PARQUET;
      
      CREATE TABLE country(
          name VARCHAR(64)
      ) STORED AS PARQUET;
      
      insert into partitioned_user partition (country='USA') values ("John", "Doe");
      insert into partitioned_user partition (country='UK') values ("Sir", "Arthur");
      insert into partitioned_user partition (country='FR') values ("Jacque", "Martin");
      
      insert into country values ('USA');
      
      set hive.execution.engine=spark;
      set hive.spark.dynamic.partition.pruning=true;
      explain select * from partitioned_user u where u.country in (select c.name from country c);
      -- Error while compiling statement: FAILED: SemanticException MetaException(message:Number of partitions scanned (=3) on table 'partitioned_user' exceeds limit (=2). This is controlled on the metastore server by hive.metastore.limit.partition.request.)
      

      The EXPLAIN plan generation fails because there are three partitions involved in this query. However, since Spark DPP is enabled, Hive should be able to use table stats to know that the country table only has one record and therefore there will only need to be one partitioned scanned and allow this query to execute.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            belugabehr David Mollitor

            Dates

              Created:
              Updated:

              Slack

                Issue deployment