Details
-
Improvement
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
Impala 3.3.0
-
None
-
ghx-label-4
Description
Impala can leak locks/transactions in the following ways:
1. commit/abort fails (e.g. because HMS is down) - currently Impala does not have a re-try logic for these cases
2. Impala exits (e.g. crash or SIGTERM), and doesn't close it's open locks/transactions
Hive has a timeout logic that should drop locks/transactions if there was no heartbeat for them for some time. I think that Impala should not rely on this completely though, because:
- the timeout time can be too long for some workflows, especially if there are exclusive locks involved
- timeout logic doesn't seem to work in the Impala minicluster
Leak type 1. could be solved by periodically trying to abort transactions where commit/abort failed, similarly how heartbeating works now
Leak type 2. could be solved by aborting open locks/transactions that were opened by the given impalad/catalogd role during start up, so the what is leaked during exit would be cleared during restart. This would need impalad/catalogd to use unique "user" fields when acquiring locks/transactions instead of the current "impala".
Note that the transaction opened by INSERT seem problematic, as these are opened and possibly aborted by the coordinator, but commit happens on catalogd.