Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46684

CoGroup.applyInPandas/Arrow should pass arguments properly

    XMLWordPrintableJSON

Details

    Description

      In Spark Connect, CoGroup.applyInPandas/Arrow doesn't take arguments properly, so the arguments of the UDF can be broken:

      >>> import pandas as pd
      >>>
      >>> df1 = spark.createDataFrame(
      ...     [(1, 1.0, "a"), (2, 2.0, "b"), (1, 3.0, "c"), (2, 4.0, "d")], ("id", "v1", "v2")
      ... )
      >>> df2 = spark.createDataFrame([(1, "x"), (2, "y"), (1, "z")], ("id", "v3"))
      >>>
      >>> def summarize(left, right):
      ...     return pd.DataFrame(
      ...         {
      ...             "left_rows": [len(left)],
      ...             "left_columns": [len(left.columns)],
      ...             "right_rows": [len(right)],
      ...             "right_columns": [len(right.columns)],
      ...         }
      ...     )
      ...
      >>> df = (
      ...     df1.groupby("id")
      ...     .cogroup(df2.groupby("id"))
      ...     .applyInPandas(
      ...         summarize,
      ...         schema="left_rows long, left_columns long, right_rows long, right_columns long",
      ...     )
      ... )
      >>>
      >>> df.show()
      +---------+------------+----------+-------------+
      |left_rows|left_columns|right_rows|right_columns|
      +---------+------------+----------+-------------+
      |        2|           1|         2|            1|
      |        2|           1|         1|            1|
      +---------+------------+----------+-------------+
      

      The result should be:

      +---------+------------+----------+-------------+
      |left_rows|left_columns|right_rows|right_columns|
      +---------+------------+----------+-------------+
      |        2|           3|         2|            2|
      |        2|           3|         1|            2|
      +---------+------------+----------+-------------+
      

       

      Attachments

        Activity

          People

            ueshin Takuya Ueshin
            ueshin Takuya Ueshin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: