Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19580

Support for avro.schema.url while writing to hive table

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.6.3, 2.1.0
    • 2.2.1, 2.3.0
    • SQL
    • None

    Description

      Support for writing to Hive table which uses Avro schema pointed to by avro.schema.url is missing.

      I have Hive table with Avro data format. Table is created with query like this:

      CREATE TABLE some_table
        PARTITIONED BY (YEAR int, MONTH int, DAY int)
        ROW FORMAT SERDE
              'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
            STORED AS INPUTFORMAT
              'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
            OUTPUTFORMAT
              'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
            LOCATION 'hdfs:///user/some_user/some_table'
            TBLPROPERTIES (
              'avro.schema.url'='hdfs:///user/some_user/some_table.avsc'
            )
      

      Please notice that there is `avro.schema.url` and not `avro.schema.literal` property, as we have to keep schemas in separate files for some reasons.
      Trying to write to such table results in NPE.

      Tried to find workaround for this, but nothing helps. Tried:

      • setting df.write.option("avroSchema", avroSchema) with explicit schema in string
      • changing TBLPROPERTIES to SERDEPROPERTIES
      • replacing explicit detailed SERDE specification with STORED AS AVRO

      I found that this can be solved by adding a couple of lines in `org.apache.spark.sql.hive.HiveShim` next to `AvroSerdeUtils.AvroTableProperties.SCHEMA_LITERAL` is referenced.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vinodkc Vinod KC
            mateo7 Mateusz Boryn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment