Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35801

SPIP: Row-level operations in Data Source V2

Attach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • SQL

    Description

      Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more important for modern Big Data workflows. Use cases include but are not limited to deleting a set of records for regulatory compliance, updating a set of records to fix an issue in the ingestion pipeline, applying changes in a transaction log to a fact table. Row-level operations allow users to easily express their use cases that would otherwise require much more SQL. Common patterns for updating partitions are to read, union, and overwrite or read, diff, and append. Using commands like MERGE, these operations are easier to express and can be more efficient to run.

      Hive supports MERGE and Spark should implement similar support.

      SPIP: https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            aokolnychyi Anton Okolnychyi
            L. C. Hsieh L. C. Hsieh

            Dates

              Created:
              Updated:

              Slack

                Issue deployment