Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-24885

The state of unset low or high value in LongColumnStatsData can not be retrieved

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • API
    • None

    Description

      During the work to improve Impala column stats to compute min/max for columns, it is found that the state of unset low or high value in LongColumnStatsData can not be retrieved back. This is illustrated in the following Impala test case added to MetastoreEventsProcessorTest.

                                                                                      
        @Test                                                                                        
        public void testUnsetAndCheckUnsetLowHighValue() throws CatalogException {                   
          try (MetaStoreClient msClient = catalog_.getMetaStoreClient()) {                           
            List<String> colNames = new ArrayList<String>();                                         
            colNames.add("id");                                                                      
            colNames.add("int_col");                                                                 
            colNames.add("bigint_col");                                                              
            List<ColumnStatisticsObj> colStatsObjs =                                             
                msClient.getHiveClient().getTableColumnStatistics(                                   
                    "unique_database", "alltypes", colNames, "impala");                              
            for (ColumnStatisticsObj colStatsObj : colStatsObjs) {                                   
              ColumnStatisticsData colStatsData = colStatsObj.getStatsData();                        
              LongColumnStatsData longColStatsData = colStatsData.getLongStats();                    
              longColStatsData.unsetLowValue();                                                      
              longColStatsData.unsetHighValue();                                                     
              colStatsData.setLongStats(longColStatsData);                                           
            }                                                                                        
            assertTrue("All good!", true);                                                           
            colStatsObjs = msClient.getHiveClient().getTableColumnStatistics(                        
                "unique_database", "alltypes", colNames, "impala");                                  
            for (ColumnStatisticsObj colStatsObj : colStatsObjs) {                                   
              ColumnStatisticsData colStatsData = colStatsObj.getStatsData();                        
              LongColumnStatsData longColStatsData = colStatsData.getLongStats();                
              assertFalse("isSetLowValue() should be false", longColStatsData.isSetLowValue());  
              assertFalse(                                                                           
                  "isSetHighValue() should be false", longColStatsData.isSetHighValue());            
            }                                                                                        
            assertTrue("All good!", true);                                                           
          } catch (NoSuchObjectException e) {                                                        
            assertFalse(String.format("No such object exception: %s", e), false);                    
          } catch (MetaException e) {                                                                
            assertFalse(String.format("Metadata exception: %s", e), false);                          
          } catch (TException e) {                                                                   
            assertFalse(String.format("TException: %s", e), false);                                  
          }                                                                                          
        } 
      

      The assertion on isSetLowValue() or isSetHighValue() should be false, since longColStatsData.unsetLowValue() is called in the first loop.

      To build the test,

             
      mvn -f $IMPALA_HOME/fe/pom.xml test -e -Djava.compiler=NONE -ff -Dtest=MetastoreEventsProcessorTest#testUnsetAndCheckUnsetLowHighValue
      

      Table unique_database.alltypes is defined as follows.

        
       CREATE EXTERNAL TABLE unique_database.alltypes (                                             
         id INT,                                                                                    
         bool_col BOOLEAN,                                                                          
         tinyint_col TINYINT,                                                                       
         smallint_col SMALLINT,                                                                     
         int_col INT,                                                                               
         bigint_col BIGINT,                                                                         
         float_col FLOAT,                                                                           
         double_col DOUBLE,                                                                         
         date_string_col STRING,                                                                    
         string_col STRING,                                                                         
         timestamp_col TIMESTAMP,                                                                   
         year INT                                                                                   
       )                                                                                            
       PARTITIONED BY (                                                                             
         month INT                                                                                  
       )                                                                                            
       STORED AS PARQUET                                                                            
       LOCATION 'hdfs://localhost:20500/test-warehouse/unique_database.db/alltypes'                 
       TBLPROPERTIES ('DO_NOT_UPDATE_STATS'='true', 'OBJCAPABILITIES'='EXTREAD,EXTWRITE', 'STATS_GENERATED'='TASK', 'external.table.purge'='TRUE', 'impala.lastComputeStatsTime'='1615492819', 'numRows'='0', 'totalSize'='0')  
      

      It can be built via the following in an Impala environment.

        
      create database if not exists unique_database;                                             
      use unique_database;                                             
      drop table if exists alltypes;                               
      CREATE TABLE alltypes
      partitioned by (month)
      STORED AS PARQUET
      as select * from functional_parquet.alltypes 
      ;
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            sql_forever Qifan Chen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: