Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47385

Tuple encoder produces wrong results with Option inputs

    XMLWordPrintableJSON

Details

    Description

       

      The behavior of tupled encoders on the Option type was changed by https://github.com/apache/spark/pull/40755.

      import org.apache.spark.sql.{Encoders, Encoder} 
      
      case class Required(name: Stringcase class Optional(name: String) 
      
      implicit val enc: Encoder[(Required, Option[Optional])] = Encoders.tuple(Encoders.product[Required], Encoders.product[Option[Optional]]) 
       
      spark.createDataFrame(Seq( 
      (Required("1"), Some(Optional("1"))), 
      (Required("2"), None) 
      )).as[(Required, Option[Optional])].collect()

      Before the PR, the result is:

      Array((Required(1),Some(Optional(1))), (Required(2),None))

      After the PR, the result is:

      Array((Required(1),Some(Optional(1))), (Required(2),null)) 

      which is incorrect because the original input is None rather than null.

       

      Attachments

        Issue Links

          Activity

            People

              mashplant Chenhao Li
              mashplant Chenhao Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: