Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26649

Hive metabase performance issues due to slow queries

    XMLWordPrintableJSON

Details

    • Patch

    Description

      When the Hive metabase uses Mysql, during the peak period of Hive statement query, the metastore initiates a large amount of DirectSQL, which will cause performance problems in the metabase. The fundamental reason is that some DirectSQL performance problems cause a large number of slow queries at the DB level.
      For example for the following Hive query:

      select * from imd_fcac_safe.fcac_dw_loan_details where ds='2021-10-10' and sysid='MCFCM' 

      where ds and sysid are the primary and secondary partitions of the imd_fcac_safe.fcac_dw_loan_details table, respectively
       
      The Hive statement will generate the DirectSQL query as follows:

      explain select PARTITIONS.PART_ID from PARTITIONS  inner join TBLS on PARTITIONS.TBL_ID = TBLS.TBL_ID     and TBLS.TBL_NAME = 'fcac_dw_loan_details'   inner join DBS on TBLS.DB_ID = DBS.DB_ID      and DBS.NAME = 'imd_fcac_safe' inner join PARTITION_KEY_VALS FILTER0 on FILTER0.PART_ID = PARTITIONS.PART_ID and FILTER0.INTEGER_IDX = 0 inner join PARTITION_KEY_VALS FILTER1 on FILTER1.PART_ID = PARTITIONS.PART_ID and FILTER1.INTEGER_IDX = 1 where ( ((FILTER0.PART_KEY_VAL = '2021-10-10') and (FILTER1.PART_KEY_VAL = 'MCFCM')) ) 


       
      Problems with this statement
      There is no TBL_ID field in the PARTITION_KEY_VALS table, which will cause the partition of the same name of the unrelated table to be described when performing an associated query; there is no index column in the PARTITION_KEY_VAL table, so it cannot be accelerated by the index.

      Attachments

        Issue Links

          Activity

            People

              yo8237233 yihangqiao
              yo8237233 yihangqiao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 96h
                  96h
                  Remaining:
                  Remaining Estimate - 95h 10m
                  95h 10m
                  Logged:
                  Remaining Estimate - 95h 10m
                  50m