Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9762

ALTER TABLE cannot find column

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.4.1
    • None
    • SQL
    • None
    • Ubuntu on AWS

    Description

      ALTER TABLE tbl CHANGE cannot find a column that DESCRIBE COLUMN lists.

      In the case of a table generated with HiveContext.read.json(), the output of DESCRIBE dimension_components is:

      comp_config	struct<adText:string,adTextLeft:string,background:string,brand:string,button_color:string,cta_side:string,cta_type:string,depth:string,fixed_under:string,light:string,mid_text:string,oneline:string,overhang:string,shine:string,style:string,style_secondary:string,style_small:string,type:string>
      comp_criteria	string
      comp_data_model	string
      comp_dimensions	struct<data:string,integrations:array<string>,template:string,variation:bigint>
      comp_disabled	boolean
      comp_id	bigint
      comp_path	string
      comp_placementData	struct<mod:string>
      comp_slot_types	array<string>
      

      However, alter table dimension_components change comp_dimensions comp_dimensions struct<data:string,integrations:array<string>,template:string,variation:bigint,z:string>; fails with:

      15/08/08 23:13:07 ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: Invalid column reference comp_dimensions
      	at org.apache.hadoop.hive.ql.exec.DDLTask.alterTable(DDLTask.java:3584)
      	at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:312)
      	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
      	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
      	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
      	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
      	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
      	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
      	at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
      	at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
      	at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
      	at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
      	at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
      	at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
      ...
      

      Meanwhile, SHOW COLUMNS in dimension_components lists two columns: col (which does not exist in the table) and z, which was just added.

      This suggests that DDL operations in Spark SQL use table metadata inconsistently.

      Full spark-sql output here.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              simeons Simeon Simeonov
              Votes:
              6 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: