Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6907

ImpalaServer::MembershipCallback() may not remove all stale connections to disconnected Impalad nodes

    XMLWordPrintableJSON

Details

    Description

      Currently, ImpalaServer::MembershipCallback() will remove stale connections to hosts which were removed from the cluster membership.

            while (loc_entry != query_locations_.end()) {
              if (current_membership.find(loc_entry->first) == current_membership.end()) {
                unordered_set<TUniqueId>::const_iterator query_id = loc_entry->second.begin();
                // Add failed backend locations to all queries that ran on that backend.
                for(; query_id != loc_entry->second.end(); ++query_id) {
                  vector<TNetworkAddress>& failed_hosts = queries_to_cancel[*query_id];
                  failed_hosts.push_back(loc_entry->first);
                }
                exec_env_->impalad_client_cache()->CloseConnections(loc_entry->first); <<<-----
      

      However, it's relies on checking against query_locations_ which is populated only when the Impalad node acts as a coordinator and currently running queries using the disconnected backend. So ImpalaServer::MembershipCallback() will not reliably remove stale connections to hosts removed from cluster. This may cause stale connections to stay in connection cache for extended period of time, leading to query failure after the removed hosts rejoined the cluster as the stale connections are used.

      Instead, we should remove stale connections regardless of whether this node happens to be currently coordinating a query using that backend.

      Attachments

        Issue Links

          Activity

            People

              kwho Michael Ho
              kwho Michael Ho
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: