Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-3401

Unable to query Kudu tables from Hive with Kudu HMS Integration enabled

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.15.0
    • hms
    • None

    Description

      When Kudu HMS integration is enabled there are several missing fields when creating a table via query  "stored as kudu table" on Impala from hive. This results in ClassNotFound error when trying to query the table from Hive after creating the table:

       

      ERROR : Failed
      org.apache.hadoop.hive.metastore.api.MetaException: java.lang.ClassNotFoundException Class not found
      at org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:98) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
      at org.apache.hadoop.hive.metastore.HiveMetaStoreUtils.getDeserializer(HiveMetaStoreUtils.java:77) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
      at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:331) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141] 

       

      When running a following sample query in Impala to create a kudu table with Kudu HMS integration enabled the table gets created with the InputFormat, OutputFormat and SerDe Library fields are missing

       

      create table default.kudu_test (
      col1 string comment 'col1',
      col2 string comment 'col2',
      primary key (col1)
      )
      comment 'kudu_test'
      stored as kudu;

       

      SerDe Library:   NULL
      InputFormat:   NULL
      OutputFormat:   NULL

      Hive Metastore log for the table creation:
      INFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-124]: 134: source:172.25.35.0 create_table: Table(tableName:kudu_test, dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), FieldSchema(name:col2, type:string, comment:col2)], location:, inputFormat:, outputFormat:, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:, serializationLib:, parameters:{}), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:

      {kudu.table_name=default.kudu_test, kudu.table_id=5ac46856863f402fb69941ce7b967945, comment=, kudu.master_addresses=c3549-node2.coelab.cloudera.com:7051, storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler, kudu.cluster_id=65c8dfbc8b75485db1328ab42f55fa07}

      , viewOriginalText:, viewExpandedText:, tableType:MANAGED_TABLE, temporary:false, ownerType:USER)
      Running the same query in Impala with Kudu HMS Integration disabled on the other hand has these fields populated when the table is created:

      SerDe Library: org.apache.hadoop.hive.kudu.KuduSerDe NULL
      InputFormat: org.apache.hadoop.hive.kudu.KuduInputFormat NULL
      OutputFormat: org.apache.hadoop.hive.kudu.KuduOutputFormat NULL

      Hive Metastore log for table creation:
      NFO org.apache.hadoop.hive.metastore.HiveMetaStore: [pool-5-thread-173]: 183: source:172.25.35.0 create_table_req: Table(tableName:kudu_test, dbName:default, owner:root, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:col1), FieldSchema(name:col2, type:string, comment:col2)], location:null, inputFormat:org.apache.hadoop.hive.kudu.KuduInputFormat, outputFormat:org.apache.hadoop.hive.kudu.KuduOutputFormat, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.kudu.KuduSerDe, parameters:{}), bucketCols:[], sortCols:[], parameters:null), partitionKeys:[], parameters:

      {comment=kudu_test_lbodor_no_hms_integration, kudu.master_addresses=c3549-node2.coelab.cloudera.com, storage_handler=org.apache.hadoop.hive.kudu.KuduStorageHandler, kudu.table_name=impala::default.kudu_test}

      , viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, catName:hive, ownerType:USER, accessType:8)
      --------------------------------
      Code path for table creation when Kudu HMS integration enabled(Kudu Codepath):

      Quick recap of steps when creating a kudu table:

      HMSCatalog::CreateTable() —> hive::Table declared and passed to PopulateTable(… , &table) -> Thirft client Execute call —> HMSClient::CreateTable(Table(one that just got populated), envcontext(default)) -> hms_client.create_table_with_environment_context(table, envcontext). 

      CreateTable

      https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L146 ->

      Populate the fields of table

      https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_catalog.cc#L367

      Hms client call

      https://github.com/apache/kudu/blob/master/src/kudu/hms/hms_client.cc#L280

      ----------------------------- 

      Code path for table creation when Kudu HMS integration is disabled(Impala Codepath):
      CreateTable -> CreateMetaStoreTable

      https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3191

      ->line 3248 tbl.setSd(createSd(params));

      CreateSd

      https://github.com/apache/impala/blob/da3d6fc7f7c656b118bb3570cedf7d7c3158bd0b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L3260

       

      Checking the code paths its observable that the missing fields are filled via CreateSd with default values for the table getting created without Kudu HMS integration(Through Impala).

      These fields are untouched when Kudu HMS integration is enabled and table is getting created(Kudu code path). 

      Attachments

        Activity

          People

            kmammadli Khazar Mammadli
            kmammadli Khazar Mammadli
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: