Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26531

UnsupportedOperationException while creating table in Avro format if column schema contains MAP with INTEGER key

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.2
    • None
    • None

    Description

      Describe the bug

      We are trying to save a table with the Avro data format through spark-sql. The table contains MAP as part of the schema and the map's key is an INTMAP<INT, STRING>. We observe the following exception from the CREATE TABLE query:

      22/08/29 12:03:38 ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe java.lang.UnsupportedOperationException: Key of Map can only be a String

      Here is the full stack trace, for reference: Avro_Map_StackTrace.txt

      The exception is raised by the following Hive code:

        private Schema createAvroMap(TypeInfo typeInfo) {
          TypeInfo keyTypeInfo = ((MapTypeInfo) typeInfo).getMapKeyTypeInfo();
          if (((PrimitiveTypeInfo) keyTypeInfo).getPrimitiveCategory()
              != PrimitiveObjectInspector.PrimitiveCategory.STRING) {
            throw new UnsupportedOperationException("Key of Map can only be a String");
          }
      
          TypeInfo valueTypeInfo = ((MapTypeInfo) typeInfo).getMapValueTypeInfo();
          Schema valueSchema = createAvroSchema(valueTypeInfo);
      
          return Schema.createMap(valueSchema);
        }

      To Reproduce

      On Spark 3.2.1 (commit 4f25b3f712), using spark-shell with the Avro package:

      $SPARK_HOME/bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1
      

      Execute the following:

      create table avro_map(c1 MAP<INT, STRING>) ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";

      Expected behavior

      We expect to create a table successfully in Avro format if the column schema contains MAP with INTEGER key. We tried other formats like Parquet & ORC and the outcome is consistent with this expectation.

      Here is a simplified example showing expected behavior using the Parquet & ORC file formats:

      spark-sql> create table orc_map(c1 MAP<INT, STRING>) STORED AS ORC;
      Time taken: 0.196 seconds
      spark-sql> create table parquet_map(c1 MAP<INT, STRING>) STORED AS PARQUET;
      Time taken: 0.113 seconds
      spark-sql> desc orc_map;
      c1                      map<int,string>
      Time taken: 0.387 seconds, Fetched 1 row(s)
      spark-sql> desc parquet_map;
      c1                      map<int,string>
      Time taken: 0.077 seconds, Fetched 1 row(s)

      Attachments

        Activity

          People

            Unassigned Unassigned
            x/sys xsys
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: