[SPARK-38599] support load json file in case-insensitive way - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.1
Fix Version/s: None
Component/s: Input/Output, SQL
Labels:
None

Description

The task is to load json files into dataFrame.

Currently we use this method:

// textfile is rdd[string], read from json files

val table = spark.table(hiveTableName)
val hiveSchema = table.schema
var df = spark.read.option("mode", "DROPMALFORMED").schema(hiveSchema).json(textfile)

The problem is that the field in hiveSchema is all in lower-case, however the field of json string have upper case.

For example:

hive schema:

(id bigint, name string)

json string

{"Id":123, "Name":"Tom"}

in this case, the json string will not be loaded into dataFrame

I have to use the schema of hive table, due to business requirement, that's the pre-condition.

currently I have to transform the key in json string to lower case, like {"id":123, "name":"Tom"}

but I was wondering if there's any better solution for this issue?

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: TANG ZHAO

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 18/Mar/22 09:47

Updated:: 18/Mar/22 11:00