Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-42497

Support of pandas API on Spark for Spark Connect

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.5.0
    • None
    • Connect
    • None

    Description

      We should enable `pandas API on Spark` on Spark Connect.

      Attachments

        1.
        Add proto message for pandas API on Spark default index Sub-task Resolved Haejoon Lee
        2.
        Fix `default_session` to work properly Sub-task Resolved Unassigned
        3.
        Fix `createDataFrame` to work properly with rows and schema Sub-task Resolved Unassigned
        4.
        Refactor the withSequenceColumn Sub-task Resolved Hyukjin Kwon
        5.
        Basic support for pandas API on Spark Sub-task Resolved Haejoon Lee
        6.
        Reuse transformUnregisteredFunction for DistributedSequenceID. Sub-task Resolved Unassigned
        7.
        Assign JIRA tickets and add comments for all failing tests. Sub-task Resolved Haejoon Lee
        8.
        Enable doctest Sub-task Resolved Unassigned
        9.
        Clean-up remaining mypy failure Sub-task Resolved Unassigned
        10.
        Removing the dependency on `grpcio` when remote session is not used. Sub-task Resolved Haejoon Lee
        11.
        Separate test into `pyspark-conenct-pandas` and `pyspark-connect-pandas-slow` Sub-task Resolved Unassigned
        12.
        Refine `column_op` to use lambda function instead of Column API. Sub-task Resolved Unassigned
        13.
        Enable `InternalFrame.attach_distributed_column` in Spark Connect. Sub-task Resolved Haejoon Lee
        14.
        Fix unexpected `AnalysisException` from Spark Connect client Sub-task Resolved Ruifeng Zheng
        15.
        Enable pyspark.pandas.spark.functions.covar in Spark Connect. Sub-task Resolved Ruifeng Zheng
        16.
        Enable DataFrameSlowParityTests.test_eval Sub-task Resolved Ruifeng Zheng
        17.
        Enable pyspark.pandas.spark.functions.mode in Spark Connect. Sub-task Resolved Ruifeng Zheng
        18.
        Enable pyspark.pandas.spark.functions.product in Spark Connect. Sub-task Resolved Ruifeng Zheng
        19.
        Fix pyspark.sq.column._unary_op to work with Spark Connect. Sub-task Resolved Haejoon Lee
        20.
        Enable DataFrameSlowParityTests.test_udt Sub-task Resolved Unassigned
        21.
        Support `Column` for SparkConnectColumn.__getitem__ Sub-task Resolved Haejoon Lee
        22.
        Enable pyspark.pandas.spark.functions.repeat in Spark Connect. Sub-task Resolved Ruifeng Zheng
        23.
        Enable pyspark.pandas.spark.functions.var in Spark Connect. Sub-task Resolved Ruifeng Zheng
        24.
        Enable DefaultIndexParityTests.test_index_distributed_sequence_cleanup. Sub-task Resolved Unassigned
        25.
        Enable ExponentialMovingLike.mean with Spark Connect Sub-task Resolved Haejoon Lee
        26.
        Enable pyspark.pandas.spark.functions.kurt in Spark Connect. Sub-task Resolved Ruifeng Zheng
        27.
        Enable pyspark.pandas.spark.functions.skew in Spark Connect. Sub-task Resolved Ruifeng Zheng
        28.
        Enable SparkContext-related tests with Spark Connect Sub-task Resolved Haejoon Lee
        29.
        Enable RDD dependent tests with Spark Connect Sub-task Resolved Haejoon Lee
        30.
        Implement `localCheckpoint` for Spark Connect DataFrame Sub-task Resolved Unassigned
        31.
        Enable Series.interpolate with Spark Connect Sub-task Resolved Haejoon Lee
        32.
        Enable pyspark.pandas.spark.functions.stddev in Spark Connect. Sub-task Resolved Ruifeng Zheng
        33.
        Enable GroupBy.rank with Spark Connect Sub-task Resolved Unassigned
        34.
        Enable GroupBySlowParityTests.test_split_apply_combine_on_series Sub-task Resolved Unassigned
        35.
        Enable InternalFrameParityTests.test_from_pandas Sub-task Resolved Haejoon Lee
        36.
        Enable NamespaceParityTests.test_get_index_map Sub-task Resolved Haejoon Lee
        37.
        Enable NumPy compat tests Sub-task Resolved Haejoon Lee
        38.
        Fix unexpected `SparkConnectGrpcException` from Spark Connect client Sub-task Resolved Haejoon Lee
        39.
        Enable OpsOnDiffFramesEnabledSlowParityTests.test_series_eq Sub-task Resolved Haejoon Lee
        40.
        Enable `resample` with Spark Connect Sub-task Resolved Haejoon Lee
        41.
        Enable ReshapeParityTests.test_get_dummies_date_datetime Sub-task Resolved Unassigned
        42.
        Enable ReshapeParityTests.test_merge_asof Sub-task Resolved Takuya Ueshin
        43.
        Enable SeriesParityTests.test_compare Sub-task Resolved Haejoon Lee
        44.
        Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests Sub-task Resolved Haejoon Lee
        45.
        Enable PandasSQLStringFormatter.vformat to work with Spark Connect Sub-task Resolved Haejoon Lee
        46.
        Fix BinaryOps.ge to work with Spark Connect Column Sub-task Resolved Haejoon Lee
        47.
        Fix BinaryOps.gt to work with Spark Connect Column. Sub-task Resolved Haejoon Lee
        48.
        Fix BinaryOps.le to work with Spark Connect Column. Sub-task Resolved Unassigned
        49.
        Fix BinaryOps.lt to work with Spark Connect Column. Sub-task Resolved Unassigned
        50.
        Enable CategoricalOps.eq to work with Spark Connect Sub-task Resolved Unassigned
        51.
        Enable CategoricalOps.ge to work with Spark Connect. Sub-task Resolved Haejoon Lee
        52.
        Enable CategoricalOps.gt to work with Spark Connect. Sub-task Resolved Unassigned
        53.
        Enable CategoricalOps.le to work with Spark Connect. Sub-task Resolved Unassigned
        54.
        Enable CategoricalOps.lt to work with Spark Connect. Sub-task Resolved Unassigned
        55.
        Enable CategoricalOps.ne to work with Spark Connect. Sub-task Resolved Unassigned
        56.
        Fix DatetimeOps.ge to work with Spark Connect Column. Sub-task Resolved Haejoon Lee
        57.
        Fix DatetimeOps.gt to work with Spark Connect Column. Sub-task Resolved Haejoon Lee
        58.
        Fix DatetimeOps.le to work with Spark Connect Column. Sub-task Resolved Haejoon Lee
        59.
        Fix DatetimeOps.lt to work with Spark Connect Column. Sub-task Resolved Haejoon Lee
        60.
        Fix NullOps.ge to work with Spark Connect. Sub-task Resolved Haejoon Lee
        61.
        Fix NullOps.gt to work with Spark Connect. Sub-task Resolved Haejoon Lee
        62.
        Fix NullOps.le to work with Spark Connect. Sub-task Resolved Haejoon Lee
        63.
        Fix NullOps.lt to work with Spark Connect. Sub-task Resolved Haejoon Lee
        64.
        Fix NullOps.eq to work with Spark Connect Column. Sub-task Resolved Haejoon Lee
        65.
        Fix NullOps.ne to work with Spark Connect Column. Sub-task Resolved Haejoon Lee
        66.
        Enable NumOpsParityTests.test_eq Sub-task Resolved Haejoon Lee
        67.
        Enable NumOpsParityTests.test_ge Sub-task Resolved Haejoon Lee
        68.
        Enable NumOpsParityTests.test_gt Sub-task Resolved Haejoon Lee
        69.
        Enable NumOpsParityTests.test_le Sub-task Resolved Haejoon Lee
        70.
        Enable NumOpsParityTests.test_lt Sub-task Resolved Haejoon Lee
        71.
        Enable NumOpsParityTests.test_ne. Sub-task Resolved Haejoon Lee
        72.
        Fix StringOps.ge to work with Spark Connect Sub-task Resolved Haejoon Lee
        73.
        Fix StringOps.gt to work with Spark Connect Sub-task Resolved Haejoon Lee
        74.
        Fix StringOps.le to work with Spark Connect Sub-task Resolved Haejoon Lee
        75.
        Fix StringOps.lt to work with Spark Connect Sub-task Resolved Haejoon Lee
        76.
        Fix TimedeltaOps.ge to work with Spark Connect Sub-task Resolved Haejoon Lee
        77.
        Fix TimedeltaOps.gt to work with Spark Connect. Sub-task Resolved Haejoon Lee
        78.
        Fix TimedeltaOps.le to work with Spark Connect. Sub-task Resolved Haejoon Lee
        79.
        Fix TimedeltaOps.lt to work with Spark Connect. Sub-task Resolved Haejoon Lee
        80.
        Fix TimedeltaOps.rsub to work with Spark Connect. Sub-task Resolved Haejoon Lee
        81.
        Fix TimedeltaOps.sub to work with Spark Connect. Sub-task Resolved Haejoon Lee
        82.
        Fix pyspark.sql.pandas.types.to_arrow_type to work with Spark Connect Sub-task Resolved Takuya Ueshin
        83.
        Enable IndexesParityTests.test_monotonic Sub-task Resolved Unassigned
        84.
        Enable IndexesParityTests.test_to_series Sub-task Resolved Haejoon Lee
        85.
        Support functions.date_part for Spark Connect Sub-task Resolved Haejoon Lee
        86.
        Support `pyspark.ml.feature.Bucketizer` and `pyspark.mllib.stat.KernelDensity` to work with Spark Connect. Sub-task Resolved Haejoon Lee
        87.
        Enable SeriesPlotMatplotlibParityTests.test_line_plot Sub-task Resolved Haejoon Lee
        88.
        Enable SeriesPlotMatplotlibParityTests.test_pie_plot. Sub-task Resolved Haejoon Lee
        89.
        Enable FrameParityBinaryOpsTests.test_binary_operator_multiply Sub-task Resolved Unassigned
        90.
        Cleanup & consolidate tickets to simplify the tasks. Sub-task Resolved Haejoon Lee
        91.
        Add util to get proper Column or DataFrame class for Spark Connect. Sub-task Resolved Haejoon Lee
        92.
        Enable KernelDensity within Spark Connect Sub-task Resolved Haejoon Lee
        93.
        Resolve remaining AnalysisException Sub-task Resolved Unassigned
        94.
        Make WidenSetOperationTypes retains the Plan_ID_TAG Sub-task Resolved Ruifeng Zheng
        95.
        Make `ResolvePivot` retain the `Plan_ID_TAG` Sub-task Resolved Ruifeng Zheng
        96.
        Support Series.empty for Spark Connect. Sub-task Resolved Haejoon Lee

        Activity

          People

            Unassigned Unassigned
            itholic Haejoon Lee
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: