Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
ghx-label-5
Description
IMPALA-10594 and IMPALA-10590 do not ensure that queued queries are removed from the admission-controller and admission_state_map_ . A situation can arise where the coordinator that got killed did not get a chance of calling GetQueryStatus() which calls WaitOnQueued() for queued queries. This results in a memory leak where the queue_node in admission-controller and the admission_state in admission_state_map_ are never removed.
Moreover, queued queries can get into an undesirable state where if the failed coord is not in the cluster_membership, the query will stay in the queue indefinitely as it would keep hitting the unable to deque condition where the coordinator is not registered in the cluster_membership yet.
Another undesirable condition can arise for queued queries that were canceled, these never get removed from the admission_state_map_ as entries in it are only removed when a running query is released, running queries are synced via admission heartbeat, and all running queries are removed when the coordinator goes down. (running queries refers to the queries that have been successfully admitted)