Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25040

Empty string should be disallowed for data types except for string and binary types in JSON

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.2.0, 2.4.0
    • 3.0.0
    • SQL
    • None

    Description

      The issue itself seems to be a behaviour change between 1.6 and 2.x for treating empty string as null or not in double and float.

      {"a":"a1","int":1,"other":4.4}
      {"a":"a2","int":"","other":""}
      

      code :

      val config = new SparkConf().setMaster("local[5]").setAppName("test")
      val sc = SparkContext.getOrCreate(config)
      val sql = new SQLContext(sc)
      
      val file_path = this.getClass.getClassLoader.getResource("Sanity4.json").getFile
      val df = sql.read.schema(null).json(file_path)
      df.show(30)
      

      then in spark 1.6, result is

      +---+----+-----+
      | a| int|other|
      +---+----+-----+
      | a1| 1| 4.4|
      | a2|null| null|
      +---+----+-----+
      
      root
      |-- a: string (nullable = true)
      |-- int: long (nullable = true)
      |-- other: double (nullable = true)
      

      but in spark 2.2, result is

      +----+----+-----+
      | a| int|other|
      +----+----+-----+
      | a1| 1| 4.4|
      |null|null| null|
      +----+----+-----+
      
      root
      |-- a: string (nullable = true)
      |-- int: long (nullable = true)
      |-- other: double (nullable = true)
      

      Another easy reproducer:

      spark.read.schema("a DOUBLE, b FLOAT")
            .option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", """{"a": 1.1, "b": 1.1}""").toDS)
      

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: