Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12708

An UPDATE creates 2 new snapshots in Iceberg tables

    XMLWordPrintableJSON

Details

    • ghx-label-2

    Description

      UPDATE statement is now supported for Iceberg tables in Impala.

      The implementation creates the delete file(s) and the new data file(s) for the updated row(s). These files are committed in one Iceberg transaction, but the transaction adds two snapshots to the table. The first contains the delete file(s), the second adds the new data file(s) of the updated row(s). 

      This results in an unusual table history, because the first - temporary - snapshot of the transaction will have no time information associated to it (the table will spend 0 time in that state), and it will not appear as a separate entry when we query table history. Therefore it cannot be queried with time travel based on system time. However, it will appear in the history as the parent of the current snapshot, and it can be queried based on snapshot id, which will give results of an invalid table state.

      Impala should create only 1 new snapshot per UPDATE statement, so that the parent of the current snapshot points to the previous valid table state.

      Attachments

        Activity

          People

            boroknagyz Zoltán Borók-Nagy
            noemi Noemi Pap-Takacs
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: