Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22002

Insert into table partition fails partially with stats.autogather is on.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0.0
    • Fix Version/s: None
    • Component/s: HiveServer2
    • Labels:
      None

      Description

      create table test_double(id int) partitioned by (dbtest double);
      insert into test_double partition(dbtest) values (1,9.9); --> this works
      insert into test_double partition(dbtest) values (1,10); --> this fails

      But if we change it to
      insert into test_double partition(dbtest) values (1, cast (10 as double)); it succeeds

      -> the problem is only seen when trying to insert a whole number i.e. 10, 10.0, 15, 14.0 etc. The issue is not seen when inserting a number with decimal values other than 0. So insert of 10.1 goes though.

      The underlying from the HMS is

      2019-07-11T07:58:16,670  [pool-6-thread-196]: server.TThreadPoolServer (TThreadPoolServer.java:run(297)) -  occurred during processing of message. java.lang.IndexOutOfBoundsException: Index: 0 at java.util.Collections$EmptyList.get(Collections.java:4454) ~[?:1.8.0_112] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.updatePartColumnStatsWithMerge(HiveMetaStore.java:7808) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.set_aggr_stats_for(HiveMetaStore.java:7769) ~[hive-exec-3.1.0.3.1.0.0-78.jar:3.1.0.3.1.0.0-78] 
      

      With hive.stats.column.autogather=false, this exception does not occur with or without the explicit casting.

      The issue stems from the fact that HS2 created a partition with value dbtest=10 for the table and the stats processor is attempting to add column statistics for partition with value dbtest=10.0. Thus HMS getPartitionsByNames cannot find the partition with that value and thus fails to insert the stats. So while the failure initiates on HMS side, the cause in the HS2 query planning.

      It makes sense that turning off hive.stats.column.autogather resolves the issue because there is no StatsTask in a query plan.

      But SHOW PARTITIONS shows the partition as created while the query planner is not including it any plan because of the absence of stats on the partition.

        Attachments

        1. image-2019-07-31-20-02-38-069.png
          15 kB
          bencao
        2. HIVE-22002.patch
          4 kB
          bencao

          Activity

            People

            • Assignee:
              bencao bencao
              Reporter:
              ngangam Naveen Gangam
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: