Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-35801

SPIP: Row-level operations in Data Source V2

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.3.0
    • None
    • SQL

    Description

      Row-level operations such as UPDATE, DELETE, MERGE are becoming more and more important for modern Big Data workflows. Use cases include but are not limited to deleting a set of records for regulatory compliance, updating a set of records to fix an issue in the ingestion pipeline, applying changes in a transaction log to a fact table. Row-level operations allow users to easily express their use cases that would otherwise require much more SQL. Common patterns for updating partitions are to read, union, and overwrite or read, diff, and append. Using commands like MERGE, these operations are easier to express and can be more efficient to run.

      Hive supports MERGE and Spark should implement similar support.

      SPIP: https://docs.google.com/document/d/12Ywmc47j3l2WF4anG5vL4qlrhT2OKigb7_EbIKhxg60

      Attachments

        Issue Links

          1.
          DataSource V2: Add APIs for group-based row-level operations Sub-task Resolved Anton Okolnychyi
          2.
          DataSource V2: Handle DELETE commands for group-based sources Sub-task Resolved Anton Okolnychyi
          3.
          Make condition in DeleteFromTable required Sub-task Resolved Anton Okolnychyi
          4.
          DataSource V2: Support runtime group filtering in row-level commands Sub-task Resolved Anton Okolnychyi
          5.
          DataSource V2: Handle DELETE commands for delta-based sources Sub-task Resolved Anton Okolnychyi
          6.
          DataSource V2: Add APIs for delta-based row-level operations Sub-task Resolved Anton Okolnychyi
          7.
          Align UPDATE assignments with table attributes Sub-task Resolved Anton Okolnychyi
          8.
          Align MERGE assignments with table attributes Sub-task Resolved Anton Okolnychyi
          9.
          DataSource V2: Handle UPDATE commands for delta-based sources Sub-task Resolved Anton Okolnychyi
          10.
          DataSource V2: Allow representing updates as deletes and inserts Sub-task Resolved Anton Okolnychyi
          11.
          DataSource V2: Handle MERGE commands for delta-based sources Sub-task Resolved Anton Okolnychyi
          12.
          Make RowLevelOperationSuiteBase and AlignAssignmentsSuite abstract Sub-task Resolved Anton Okolnychyi
          13.
          DataSource V2: Handle MERGE commands for group-based sources Sub-task Resolved Anton Okolnychyi
          14.
          DataSource V2: Handle UPDATE commands for group-based sources Sub-task Resolved Anton Okolnychyi
          15.
          Add join hint to disable broadcasting and replicating one side of a join Sub-task Resolved Anton Okolnychyi
          16.
          Add benchmark for MergeRowsExec Sub-task Open Unassigned
          17.
          Reuse main scan exchange in group-based UPDATEs Sub-task Open Unassigned
          18.
          Prohibit non-deterministic expressions, subqueries and aggregates in MERGE conditions Sub-task Resolved Anton Okolnychyi
          19.
          Discard completely pushed down filters in group-based MERGE operations Sub-task Resolved Anton Okolnychyi
          20.
          Support schema pruning in delta-based MERGE operations Sub-task Resolved Anton Okolnychyi
          21.
          Add tests for schema pruning in delta-based UPDATEs Sub-task Resolved Anton Okolnychyi
          22.
          Add tests for schema pruning in delta-based DELETEs Sub-task Resolved Anton Okolnychyi
          23.
          Add docs for MergeRows node Sub-task Open Unassigned
          24.
          Reduce code duplication in group-based DELETE and MERGE tests Sub-task Open Unassigned

          Activity

            People

              Unassigned Unassigned
              aokolnychyi Anton Okolnychyi
              L. C. Hsieh L. C. Hsieh
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated: