Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42471

Distributed ML <> spark connect

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • Connect, ML
    • None

    Attachments

      1.
      Make spark connect supporting canceling job group Sub-task Open Unassigned
      2.
      High level design doc for Distributed ML <> spark connect Sub-task Resolved Weichen Xu
      3.
      Initial prototype implementation for PySparkML Sub-task Resolved Weichen Xu
      4.
      Extract the common .ml classes to `mllib-common` Sub-task Resolved Ruifeng Zheng
      5.
      Make LiteralExpression support array Sub-task Resolved Ruifeng Zheng
      6.
      Factor literal value conversion out to connect-common Sub-task Resolved Ruifeng Zheng
      7.
      Helper function to convert proto literal to value in Python Client Sub-task Resolved Ruifeng Zheng
      8.
      Implement ml function {array_to_vector, vector_to_array} Sub-task Resolved Ruifeng Zheng
      9.
      Move `toCatalystValue` to connect-common Sub-task Resolved Ruifeng Zheng
      10.
      Make Torch Distributor compatible with Spark Connect Sub-task Resolved Ruifeng Zheng
      11.
      Torch Distributor support Local Mode Sub-task Resolved Ruifeng Zheng
      12.
      Add torch distributor data loader that loads data from spark partition data Sub-task Resolved Weichen Xu
      13.
      Implement pyspark ML logistic regression estimator on top of torch distributor Sub-task Resolved Weichen Xu
      14.
      Basic estimator / transformer / model / evaluator interfaces and basic transformer / evaluator implementation Sub-task Resolved Weichen Xu
      15.
      Add spark DataFrame binary file format writer Sub-task Resolved Weichen Xu
      16.
      Add API `copyLocalFileToHadoopFS` Sub-task Resolved Weichen Xu
      17.
      Basic saving / loading implementation Sub-task Resolved Weichen Xu
      18.
      Implement pipeline estimator Sub-task Resolved Weichen Xu
      19.
      Implement cross validator estimator Sub-task Resolved Weichen Xu
      20.
      Move namespace from `pyspark.mlv2` to `pyspark.ml.connect` Sub-task Resolved Weichen Xu
      21.
      Implement classification evaluator Sub-task Resolved Weichen Xu
      22.
      Add example code Sub-task Resolved Weichen Xu
      23.
      Add pyspark "ml-connect" extras dependencies Sub-task Open Unassigned
      24.
      Avoid Spark connect ML model to change input pandas dataframe Sub-task Resolved Weichen Xu
      25.
      Add doc entry for `pyspark.ml.connect` module Sub-task Resolved Weichen Xu
      26.
      Add vector assembler feature transformer Sub-task Resolved Weichen Xu

      Activity

        People

          weichenxu123 Weichen Xu
          podongfeng Ruifeng Zheng
          Votes:
          0 Vote for this issue
          Watchers:
          4 Start watching this issue

          Dates

            Created:
            Updated: