Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2809

Incremental backup / diff scan does not handle rows that are inserted and deleted between two incrementals correctly

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.9.0
    • 1.10.0
    • backup

    Description

      I did the following sequence of operations:

      1. Insert 100 million rows
      2. Update 1 out of every 11 rows
      3. Make a full backup
      4. Insert 100 million more rows, after the original rows in keyspace
      5. Delete 1 out of every 23 rows
      6. Make an incremental backup

      Restore failed to apply the incremental backup, failing with an error like

      java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; sample errors:
      

      Due to another bug, there's no sample errors, but after hacking around that bug, I found that the incremental contained a row with a DELETE action for a key that is not present in the full backup. That's because the row was inserted in step 4 and deleted in step 5, between backups.

      We could fix this by

      1. Making diff scan not return a DELETE for such a row
      2. Implementing and using DELETE IGNORE in the restore job

      Attachments

        Activity

          People

            adar Adar Dembo
            wdberkeley William Berkeley
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: