Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.1.1
-
None
-
None
Description
The task is to load json files into dataFrame.
Currently we use this method:
// textfile is rdd[string], read from json files
val table = spark.table(hiveTableName)
val hiveSchema = table.schema
var df = spark.read.option("mode", "DROPMALFORMED").schema(hiveSchema).json(textfile)
The problem is that the field in hiveSchema is all in lower-case, however the field of json string have upper case.
For example:
hive schema:
(id bigint, name string)
json string
{"Id":123, "Name":"Tom"}
in this case, the json string will not be loaded into dataFrame
I have to use the schema of hive table, due to business requirement, that's the pre-condition.
currently I have to transform the key in json string to lower case, like {"id":123, "name":"Tom"}
but I was wondering if there's any better solution for this issue?