[SPARK-35801] SPIP: Row-level operations in Data Source V2 - ASF JIRA

Attach files

Attach Screenshot

Add vote

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: In Progress
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.0
Fix Version/s: None
Component/s: SQL
Labels:
- SPIP

Description

Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more important for modern Big Data workflows. Use cases include but are not limited to deleting a set of records for regulatory compliance, updating a set of records to fix an issue in the ingestion pipeline, applying changes in a transaction log to a fact table. Row-level operations allow users to easily express their use cases that would otherwise require much more SQL. Common patterns for updating partitions are to read, union, and overwrite or read, diff, and append. Using commands like MERGE, these operations are easier to express and can be more efficient to run.

Hive supports MERGE and Spark should implement similar support.

SPIP: https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60

Attachments

Issue Links

Add Link

is a parent of

SPARK-38085 DataSource V2: Handle DELETE commands for group-based sources

Resolved

Delete this link

is related to

SPARK-44111 Prepare Apache Spark 4.0.0

Open

Delete this link

links to

[Github] Pull Request #33008 (aokolnychyi)

Delete this link

[Github] Pull Request #33008 (aokolnychyi)

Delete this link

Sub-Tasks

Create Sub-Task

1.	Add benchmark for MergeRowsExec	Open	Unassigned	Actions
2.	Reuse main scan exchange in group-based UPDATEs	Open	Unassigned	Actions
3.	Add docs for MergeRows node	Open	Unassigned	Actions
4.	Reduce code duplication in group-based DELETE and MERGE tests	Open	Unassigned	Actions

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Anton Okolnychyi

Shepherd:: L. C. Hsieh

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 17/Jun/21 20:21

Updated:: 22/Jun/23 20:17

Agile

View on Board

SPIP: Row-level operations in Data Source V2

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Agile

Slack

Issue deployment