Description
The issue itself seems to be a behaviour change between 1.6 and 2.x for treating empty string as null or not in double and float.
{"a":"a1","int":1,"other":4.4} {"a":"a2","int":"","other":""}
code :
val config = new SparkConf().setMaster("local[5]").setAppName("test") val sc = SparkContext.getOrCreate(config) val sql = new SQLContext(sc) val file_path = this.getClass.getClassLoader.getResource("Sanity4.json").getFile val df = sql.read.schema(null).json(file_path) df.show(30)
then in spark 1.6, result is
+---+----+-----+ | a| int|other| +---+----+-----+ | a1| 1| 4.4| | a2|null| null| +---+----+-----+
root |-- a: string (nullable = true) |-- int: long (nullable = true) |-- other: double (nullable = true)
but in spark 2.2, result is
+----+----+-----+ | a| int|other| +----+----+-----+ | a1| 1| 4.4| |null|null| null| +----+----+-----+
root |-- a: string (nullable = true) |-- int: long (nullable = true) |-- other: double (nullable = true)
Another easy reproducer:
spark.read.schema("a DOUBLE, b FLOAT") .option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", """{"a": 1.1, "b": 1.1}""").toDS)