Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15705

Spark won't read ORC schema from metastore for partitioned tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • SQL
    • None
    • HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)

    Description

      Spark does not seem to read the schema from the Hive metastore for partitioned tables stored as ORC files. It appears to read the schema from the files themselves, which, if they were created with Hive, does not match the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To reproduce:

      In Hive:

      hive> create table default.test (id BIGINT, name STRING) partitioned by (state STRING) stored as orc;
      hive> insert into table default.test partition (state="CA") values (1, "mike"), (2, "steve"), (3, "bill");
      

      In Spark

      scala> spark.table("default.test").printSchema
      

      Expected result: Spark should preserve the column names that were defined in Hive.

      Actual Result:

      root
       |-- _col0: long (nullable = true)
       |-- _col1: string (nullable = true)
       |-- state: string (nullable = true)
      

      Possibly related to SPARK-14959?

      Attachments

        Issue Links

          Activity

            People

              yhuai Yin Huai
              nseggert Nic Eggert
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: