[SPARK-26984] Incompatibility between Spark releases - Some(null) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Not A Problem
Affects Version/s: 2.4.0
Fix Version/s: None
Component/s: Spark Core
Labels:
- newbie
Environment:

Linux CentOS, Databricks.

Flags:

Patch
External issue URL:
https://stackoverflow.com/questions/54851205/why-does-somenull-throw-nullpointerexception-in-spark-2-4-but-worked-in-2-2/54861152#54861152
External issue ID:
Why does Some(null) throw NullPointerException in Spark 2.4 (but worked in 2.2)?

Description

Please refer to https://stackoverflow.com/questions/54851205/why-does-somenull-throw-nullpointerexception-in-spark-2-4-but-worked-in-2-2/54861152#54861152.

NB: Not sure of priority being correct - no doubt one will evaluate.

It is noted that the following:

val df = Seq(
  (1, Some("a"), Some(1)),
  (2, Some(null), Some(2)),
  (3, Some("c"), Some(3)),
  (4, None, None)).toDF("c1", "c2", "c3")}

In Spark 2.2.1 (on mapr) the Some(null) works fine, in Spark 2.4.0 on Databricks an error ensues.

java.lang.RuntimeException: Error while encoding: java.lang.NullPointerException assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._1 AS _1#6 staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, unwrapoption(ObjectType(class java.lang.String), assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._2), true, false) AS _2#7 unwrapoption(IntegerType, assertnotnull(assertnotnull(input[0, scala.Tuple3, true]))._3) AS _3#8 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:293) at org.apache.spark.sql.SparkSession.$anonfun$createDataset$1(SparkSession.scala:472) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233) at scala.collection.immutable.List.foreach(List.scala:388) at scala.collection.TraversableLike.map(TraversableLike.scala:233) at scala.collection.TraversableLike.map$(TraversableLike.scala:226) at scala.collection.immutable.List.map(List.scala:294) at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:472) at org.apache.spark.sql.SQLContext.createDataset(SQLContext.scala:377) at org.apache.spark.sql.SQLImplicits.localSeqToDatasetHolder(SQLImplicits.scala:228) ... 57 elided Caused by: java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:289) ... 66 more

You can argue it is solvable otherwise, but there may well be an existing code base that could be affected.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Gerard Alexander

Shepherd:: Jacek Laskowski

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 25/Feb/19 08:50

Updated:: 03/Mar/19 18:46

Resolved:: 01/Mar/19 21:04