Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9481

allow column list specification in INSERT statement

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.14.0
    • 1.2.0
    • Parser, Query Processor, SQL
    • None

    Description

      Given a table FOO(a int, b int, c int), ANSI SQL supports insert into FOO(c,b) select x,y from T. The expectation is that 'x' is written to column 'c' and 'y' is written column 'b' and 'a' is set to NULL, assuming column 'a' is NULLABLE.

      Hive does not support this. In Hive one has to ensure that the data producing statement has a schema that matches target table schema.

      Since Hive doesn't support DEFAULT value for columns in CREATE TABLE, when target schema is explicitly provided, missing columns will be set to NULL if they are NULLABLE, otherwise an error will be raised.

      If/when DEFAULT clause is supported, this can be enhanced to set default value rather than NULL.

      Thus, given

      create table source (a int, b int);
      create table target (x int, y int, z int);
      create table target2 (x int, y int, z int);
      
      insert into target(y,z) select * from source;

      will mean

      insert into target select null as x, a, b from source;

      and

      insert into target(z,y) select * from source;

      will meant

      insert into target select null as x, b, a from source;

      Also,

      from source 
        insert into target(y,z) select null as x, * 
        insert into target2(y,z) select null as x, source.*;
      

      and for partitioned tables, given

      Given:
      CREATE TABLE pageviews (userid VARCHAR(64), link STRING, "from" STRING)
        PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;
      
      INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23')(userid,link)  
         VALUES ('jsmith', 'mail.com');
      

      And dynamic partitioning

      INSERT INTO TABLE pageviews PARTITION (datestamp)(userid,datestamp,link) 
          VALUES ('jsmith', '2014-09-23', 'mail.com');
      

      In all cases, the schema specification contains columns of the target table which are matched by position to the values produced by VALUES clause/SELECT statement. If the producer side provides values for a dynamic partition column, the column should be in the specified schema. Static partition values are part of the partition spec and thus are not produced by the producer and should not be part of the schema specification.

      Attachments

        1. HIVE-9481.2.patch
          15 kB
          Eugene Koifman
        2. HIVE-9481.4.patch
          37 kB
          Eugene Koifman
        3. HIVE-9481.5.patch
          49 kB
          Eugene Koifman
        4. HIVE-9481.6.patch
          50 kB
          Eugene Koifman
        5. HIVE-9481.patch
          14 kB
          Eugene Koifman

        Issue Links

          Activity

            People

              ekoifman Eugene Koifman
              ekoifman Eugene Koifman
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: