[SPARK-25040] Empty string should be disallowed for data types except for string and binary types in JSON - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.2.0, 2.4.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Target Version/s:

3.0.0

Description

The issue itself seems to be a behaviour change between 1.6 and 2.x for treating empty string as null or not in double and float.

{"a":"a1","int":1,"other":4.4}
{"a":"a2","int":"","other":""}

code ：

val config = new SparkConf().setMaster("local[5]").setAppName("test")
val sc = SparkContext.getOrCreate(config)
val sql = new SQLContext(sc)

val file_path = this.getClass.getClassLoader.getResource("Sanity4.json").getFile
val df = sql.read.schema(null).json(file_path)
df.show(30)

then in spark 1.6, result is

+---+----+-----+
| a| int|other|
+---+----+-----+
| a1| 1| 4.4|
| a2|null| null|
+---+----+-----+

root
|-- a: string (nullable = true)
|-- int: long (nullable = true)
|-- other: double (nullable = true)

but in spark 2.2, result is

+----+----+-----+
| a| int|other|
+----+----+-----+
| a1| 1| 4.4|
|null|null| null|
+----+----+-----+

root
|-- a: string (nullable = true)
|-- int: long (nullable = true)
|-- other: double (nullable = true)

Another easy reproducer:

spark.read.schema("a DOUBLE, b FLOAT")
      .option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", """{"a": 1.1, "b": 1.1}""").toDS)

Attachments

Issue Links

links to

[Github] Pull Request #22019 (HyukjinKwon)

[Github] Pull Request #22787 (viirya)

GitHub Pull Request #22787

GitHub Pull Request #27456

Activity

People

Assignee:: L. C. Hsieh

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Aug/18 05:19

Updated:: 12/Dec/22 18:11

Resolved:: 23/Oct/18 05:44