Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.2
-
None
-
None
Description
Describe the bug
We are trying to save a table with the Avro data format through spark-sql. The table contains MAP as part of the schema and the map's key is an INT: MAP<INT, STRING>. We observe the following exception from the CREATE TABLE query:
22/08/29 12:03:38 ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe java.lang.UnsupportedOperationException: Key of Map can only be a String
Here is the full stack trace, for reference: Avro_Map_StackTrace.txt
The exception is raised by the following Hive code:
private Schema createAvroMap(TypeInfo typeInfo) { TypeInfo keyTypeInfo = ((MapTypeInfo) typeInfo).getMapKeyTypeInfo(); if (((PrimitiveTypeInfo) keyTypeInfo).getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) { throw new UnsupportedOperationException("Key of Map can only be a String"); } TypeInfo valueTypeInfo = ((MapTypeInfo) typeInfo).getMapValueTypeInfo(); Schema valueSchema = createAvroSchema(valueTypeInfo); return Schema.createMap(valueSchema); }
To Reproduce
On Spark 3.2.1 (commit 4f25b3f712), using spark-shell with the Avro package:
$SPARK_HOME/bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1
Execute the following:
create table avro_map(c1 MAP<INT, STRING>) ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";
Expected behavior
We expect to create a table successfully in Avro format if the column schema contains MAP with INTEGER key. We tried other formats like Parquet & ORC and the outcome is consistent with this expectation.
Here is a simplified example showing expected behavior using the Parquet & ORC file formats:
spark-sql> create table orc_map(c1 MAP<INT, STRING>) STORED AS ORC; Time taken: 0.196 seconds spark-sql> create table parquet_map(c1 MAP<INT, STRING>) STORED AS PARQUET; Time taken: 0.113 seconds spark-sql> desc orc_map; c1 map<int,string> Time taken: 0.387 seconds, Fetched 1 row(s) spark-sql> desc parquet_map; c1 map<int,string> Time taken: 0.077 seconds, Fetched 1 row(s)