Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6817 DataFrame UDFs in R
  3. SPARK-12919

Implement dapply() on DataFrame in SparkR

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 2.0.0
    • SparkR
    • None

    Description

      dapply() applies an R function on each partition of a DataFrame and returns a new DataFrame.

      The function signature is:

      	dapply(df, function(localDF) {}, schema = NULL)
      

      R function input: local data.frame from the partition on local node
      R function output: local data.frame

      Schema specifies the Row format of the resulting DataFrame. It must match the R function's output.

      If schema is not specified, each partition of the result DataFrame will be serialized in R into a single byte array. Such resulting DataFrame can be processed by successive calls to dapply() or collect(), but can't be processed by normal DataFrame operations.

      Attachments

        Activity

          People

            sunrui Sun Rui
            sunrui Sun Rui
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: