Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26759 Arrow optimization in SparkR's interoperability
  3. SPARK-26858

Vectorized gapplyCollect, Arrow optimization in native R function execution

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete CommentsDelete
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Later
    • 3.0.0
    • None
    • SparkR, SQL
    • None

    Description

      Unlike gapply, gapplyCollect requires additional ser/de steps because it can omit the schema, and Spark SQL doesn't know the return type before actually execution happens.

      In original code path, it's done via using binary schema. Once gapply is done (SPARK-26761). we can mimic this approach in vectorized gapply to support gapplyCollect.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            gurwls223 Hyukjin Kwon Assign to me
            gurwls223 Hyukjin Kwon
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment