Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9155 Single Admission Controller per Cluster
  3. IMPALA-10767

Fix handling of queued queries for coordinator failure modes and during cancellation

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None
    • ghx-label-5

    Description

      IMPALA-10594 and IMPALA-10590 do not ensure that queued queries are removed from the admission-controller and admission_state_map_ . A situation can arise where the coordinator that got killed did not get a chance of calling GetQueryStatus() which calls WaitOnQueued() for queued queries. This results in a memory leak where the queue_node in admission-controller and the admission_state in admission_state_map_ are never removed.
      Moreover, queued queries can get into an undesirable state where if the failed coord is not in the cluster_membership, the query will stay in the queue indefinitely as it would keep hitting the unable to deque condition where the coordinator is not registered in the cluster_membership yet.

      Another undesirable condition can arise for queued queries that were canceled, these never get removed from the admission_state_map_ as entries in it are only removed when a running query is released, running queries are synced via admission heartbeat, and all running queries are removed when the coordinator goes down. (running queries refers to the queries that have been successfully admitted)

      Attachments

        Activity

          People

            bikramjeet.vig Bikramjeet Vig
            bikramjeet.vig Bikramjeet Vig
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: