Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27463

Support Dataframe Cogroup via Pandas UDFs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • PySpark, SQL
    • None

    Description

      Recent work on Pandas UDFs in Spark, has allowed for improved interoperability between Pandas and Spark.  This proposal aims to extend this by introducing a new Pandas UDF type which would allow for a cogroup operation to be applied to two PySpark DataFrames.

      Full details are in the google document linked below.

       

      Attachments

        Issue Links

          Activity

            People

              d80tb7 Chris Martin
              d80tb7 Chris Martin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: