Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26678

Empty values end up as quoted empty strings in CSV files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.4.0
    • None
    • SQL

    Description

      Problem statement

      Empty string values were written to CSV as unquoted strings prior Spark version 2.4.0.

      From version 2.4.0 empty string values end up as "" values in CSV files which is a problem if an application was expected to not wrap empty values in quotes (which is certainly the case if the CSV is intended to be used in Microsoft PowerBI for example as it doesn't handle CSV files with double quotes).

      The following code ends up with the following results in the different versions of Spark:

       

      Spark version Code Result
      2.3.0
      val df = List("aa", "", "bb").toDF("name")
      df.coalesce(1).write.option("header", "true").csv("/23.csv")
      
      name
      aa
      bb
      
      2.4.0
      val df = List("aa", "", "bb").toDF("name")
      df.coalesce(1).write.option("header", "true").csv("/24.csv")
      
      name
      aa
      ""
      bb
      
      2.4.0
      val df = List("aa", "", "bb").toDF("name")
      df.coalesce(1).write.option("header", "true").option("quote", "").csv("/24-2.csv")
      
      name
      aa
      ""
      bb
      

      If the intention was to produce standard-looking CSV files (even though CSV standard doesn't exists) we still need a way to disable automatic quoting.

      Also, using

      option("quote", "\u0000")
      

      had no effect; double-quotes were used still.

      Proposed solution

      Using the option

      option("quote", "")
      

      should disable quotes.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            rob_v Robert V
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: