Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22336

Updates should be pushed to the Metastore backend DB before creating the notification event

    XMLWordPrintableJSON

Details

    Description

      There was an issue on HDP-3.1 where a table couldn't be deleted, because some related objects (like storage descriptor) were missing from the metastore. There was a previous delete attempt on that table which went wrong, but no rollback happened, that's why the SD were missing. In that previous delete, the notification creation swallowed the error which came from the backend DB, that's why no rollback happened. Here are the steps which happened in the first delete attempt:

      1. Open a transaction (transaction_1) - this step was successful
      2. Delete all the objects which are related to the table - this step was successful too, so the SD and other objects were deleted
      3. Delete the table - this step failed in the backend DB, but according to the log the delete happens in a batch statement, so it won't necessarily be executed right at this moment, so we won't see an error here
      4. Create a notification about the table delete:
        1. Open an other transaction for the notification creation (transaction_2) - call the ObjectStore.openTransaction method which increases a counter for open transactions and then checks if there is already an active transaction. If there is, then just returns true and doesn't really create a new transaction.
        2. Lock the notification id in the metastore backend db for update - here is where the exception from the backend DB (let's call it "MySQL Exception") manifests
        3. If an exception occurs during acquiring the log, retry - The "MySQL Exception" was caught and since there is no check on the exception, the retry mechanism thinks that it happened because couldn't acquire the log for the notification id, so retries and "forgot" about the "MySQL Exception".
        4. If the lock was acquired successfully, create the notification - Second time, the lock was acquired successfully, so the notification creation was successful.
        5. Commit transaction_2 - Just decrease the transaction counter, but doesn't actually commits anything.
      5. Commit transaction_1 - This commits the transaction, but since the error already got manifested and kind of "handled", here we won't see any error, just that the commit was successful, so no rollback happens and leaves the table object in an invalid state.
      6. If the commit was not successful then rollback

      In the customer setup, this issue could be fixed by adding a flush call before creating the notification event, so all the updates would be pushed to the backend db and the error would manifest at this point. With this, the error would go back to the HiveMetastore class which would do the rollback and the delete table operation would fail as it should be, since the table couldn't be deleted. But then the Hivemetastore retry mechanism could try the table deletion again.

      Attachments

        1. HIVE-22336.3.patch
          0.8 kB
          Marta Kuczora
        2. HIVE-22336.2.patch
          0.8 kB
          Marta Kuczora
        3. HIVE-22336.1.patch
          0.8 kB
          Marta Kuczora

        Issue Links

          Activity

            People

              kuczoram Marta Kuczora
              kuczoram Marta Kuczora
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: