Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40678

JSON conversion of ArrayType is not properly supported in Spark 3.2/2.13

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.2.0
    • None
    • Input/Output
    • None

    Description

      In Spark 3.2 (Scala 2.13), values with ArrayType are no longer properly support with JSON; e.g.

      import org.apache.spark.sql.SparkSession
      
      case class KeyValue(key: String, value: Array[Byte])
      
      val spark = SparkSession.builder().master("local[1]").appName("test").getOrCreate()
      
      import spark.implicits._
      
      val df = Seq(Array(KeyValue("foo", "bar".getBytes))).toDF()
      
      df.foreach(r => println(r.json))
      

      Expected:

      [{foo, bar}]
      

      Encountered:

      java.lang.IllegalArgumentException: Failed to convert value ArraySeq([foo,[B@dcdb68f]) (class of class scala.collection.mutable.ArraySeq$ofRef}) with the type of ArrayType(Seq(StructField(key,StringType,false), StructField(value,BinaryType,false)),true) to JSON.
      	at org.apache.spark.sql.Row.toJson$1(Row.scala:604)
      	at org.apache.spark.sql.Row.jsonValue(Row.scala:613)
      	at org.apache.spark.sql.Row.jsonValue$(Row.scala:552)
      	at org.apache.spark.sql.catalyst.expressions.GenericRow.jsonValue(rows.scala:166)
      	at org.apache.spark.sql.Row.json(Row.scala:535)
      	at org.apache.spark.sql.Row.json$(Row.scala:535)
      	at org.apache.spark.sql.catalyst.expressions.GenericRow.json(rows.scala:166)
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            cchantepie Cédric Chantepie
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: