Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.8.1
-
None
-
None
-
- Zeppelin v0.8.1
- Spark v2.4.0 (1 Master, N Workers)
- Hadoop (Embedded, Maybe v2.7.x)
- The interpreter will be instantiated Per Note in scoped process
Description
When I defined my own function with UDF (User-Defined Functions) feature, and I got the error message like this:
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
I just defined the simple function:
import java.text.SimpleDateFormat def diffHour(s1: String, s2: String): Long = { var hour = 0L try { val sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss") val d1 = sdf.parse(s1) val d2 = sdf.parse(s2) hour = d2.getTime - d1.getTime hour /= 1000 * 60 * 60 } catch { case e: Exception => hour = -1 } hour }
And registered my function to Spark SQL Context:
sqlContext.udf.register("diffHour", diffHour _)
Now I expected I can use my function on SQL.
%sql
SELECT
id,
time,
diffHour(time, '2019-01-01 00:00:00') as hour
FROM users
But the error occurred I mentioned at first.
I used Per Note and scoped settings for Spark Interpreter.
So I changed Interpreter settings to Globally.
Then error not occurred.
How can I fix it?
Please help me.