Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-19187

nodetool assassinate may cause thread serialization for that node

    XMLWordPrintableJSON

Details

    Description

      When assassinate an ip address that is not in the gossip map, a "corrupted" entry will be inserted into the gossip map. (1)

      For example, if we do "nodetool assassinate 10.1.1.1"

      we will get an entry like below by running "nodetool gossipinfo":

       

      /10.1.1.1
        generation:1702006511
        heartbeat:9999
        STATUS:209516:LEFT,-8393921141401589197,1702265651923
        STATUS_WITH_PORT:209515:LEFT,-8393921141401589197,1702265651923
        TOKENS: not present 

       

      This entry in endpointStateMap will cause issue for isUpgradingFromVersionLowerThan function. Because the upgradeFromVersionSupplier supplier will always set the allHostsHaveKnownVersion flag to false so no memoized value will be returned. The "get" function will always require a lock from this line.

      If application is using "fetchAll", the native-transport-requests thread will hit this line. This means all the native-transport-requests thread is serialized, also, the lock is shared by GossipStage threads. It means if a node in a cluster with the corrupted gossip map is restart, the node will run into this problem.

      To fix the issue,

      1. Why we want to add a dummy entry for nodetool assassinate if the endpoint is not in the map anymore. Should we do nothing or throw exception if the node is not in the gossip map anymore?
      2. Before checking if a version is null, we should make sure the node is not a dead node. A decommissioned node, a left node should not be considered part of the cluster anymore when calculating "upgradeInProgressPossible"

       

      Attachments

        Activity

          People

            curlylrt Runtian Liu
            curlylrt Runtian Liu
            Runtian Liu
            Brandon Williams, Stefan Miklosovic
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: