Details
-
Wish
-
Status: Resolved
-
P2
-
Resolution: Invalid
-
None
-
None
Description
As discussed in Beam Dev list, we should have a second runner for Spark based on the Dataset API.
As part of this the Spark runner will have three modules: runner-spark-core, runner-spark-rdd (Spark 1.6.x) and runner-spark-dataset (Spark 2.x).
This work should go in a feature branch (runner-spark2 already exists).
This ticket is about creating a skeleton for the structure mentioned, and everything that can be easily ported from the current runner.
Some of the work is already in the current feature branch, but a lot has changed since it was last updated.
Attachments
Issue Links
- is superceded by
-
BEAM-8470 Create a new Spark runner based on Spark Structured streaming framework
- Triage Needed