Description
CDH 5.7.x backported RpcEnv and eliminated the class server in Spark 1.6.0 REPL:
https://github.com/cloudera/spark/commit/e0d03eb30e03f589407c3cf37317a64f18db8257
An attempted fix was performed:
https://github.com/apache/zeppelin/commit/78c7b5567e7fb4985cecf147c39033c554dfc208
Although you can do basic spark operations in zeppelin after this fix, the following code is now failing:
val rdd2 = sc.parallelize(Seq(1,2,3,4,5))
rdd2.filter(_ > 3).count()
The lambda expression is not being transferred to the executors:
java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
As far as I understand Zeppelin supports the RpcEnv for Spark 2.11 only by using the -Yrepl-outdir option that is not supported in Spark 2.10
Another way of supporting RpcEnv could be using the spark-submit way of accessing the new classes through the Rpc. Here's what I've hacked and have it working locally, but I'm having trouble testing my pull request:
1. In SparkInterpreter.createSparkContext_1() if classServerUri is null after both checks, try to invoke the intp.getClassOutputDirectory() using reflection
2. Use the returned value to set sparkConf's spark.repl.class.outputDir param
The same method could be used for Spark 2.0 as well, eliminating additional http server running inside zeppelin for providing lambda classes.
Attachments
Issue Links
- blocks
-
ZEPPELIN-1347 Release 0.6.2
- Resolved
- links to