Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26576

Alter table calls on Iceberg tables can inadvertently change metadata_location

    XMLWordPrintableJSON

Details

    Description

      Concurrent alter_table calls can interfere and can cause the metadata_location property of an Iceberg table to be messed up.

      Basically there's no table level locking for Iceberg tables in Hive during the usual operations, and thus some extra performance related features are available, like concurrent inserts, as opposed to native Hive tables. This was done under the assumption that the optimistic locking pattern that is used in HiveTableOperations protects changing the metadata_location by the use of an HMS table lock there only.

      This is fine until some other alter_table calls get into the system such as one from StatTask or DDLTask. Such tasks perform their work as:

      • get the current table
      • do the alteration
      • send the changes via alter_table call to HMS

      In between the retrieval of the table and the alter_table call a legit commit from HiveTableOperations might bump the metadata_location, but this will get reverted as these tasks consider an outdated metadata_location (and the alter table call will overwrite all table props including this one too..)

      This is a design issue, and to solve this while preserving the concurrency features I propose to make use of HiveIcebergMetaHook where all such alter_table calls are intercepted, and the same locking mechanism could be used there as the one found in HiveTableOperations. The proposed flow on HMS client side would be:

      • hook: preAlterTable
        • request table level lock
        • refresh the Iceberg table from catalog (HMS) to see if new updates have arrived
        • compare the current metadata with the one thought to be the base of this request, if metadata_location is outdated overwrite it with the fresh, current one in this request
      • do the alter_table call to HMS with the relevant changes (updated stats or other properties)
      • hook: post/rollbackAlterTable
        • release table level lock

      This can work as the metadata_location should never be changed by anything other than HiveTableOperations, which is the only thing not using this hook (if it did we'd be in an endless loop). There's actually one exception which is if a user wants to change the metadata_location by hand. I can make an exception to that signalling this fact from an environmentContext instance when the corresponding AlterTableSetPropertiesDesc is constructed.

      Attachments

        Issue Links

          Activity

            People

              szita Ádám Szita
              szita Ádám Szita
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m