Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-17107

Leader election is unpredictable if two threads join concurrently election of the same replica

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 8.11, 9.3
    • None
    • SolrCloud
    • None

    Description

      There is a race condition in leader election if two thread concurrently run the election for the same replica. This is not about how leader election is distributed across multiple Solr nodes, but how multiple threads in a single Solr node conflict with each other.
       
      On the overall, when two threads (on the same server) concurrently join leader election for the same replica, the outcome is unpredictable. It may end in two nodes thinking they are the leader or not having any leader at all.
       

      How to reproduce

      I identified two scenarios, but maybe there are more:
       
      1. Zookeeper session expires while an election is already in progress.
      When we re-create the Zookeeper session, we re-register all the cores, and join elections for all of them. If an election is already in-progress or is triggered for any reason, we can have two threads on the same Solr server node running leader election for the same core.
       
      2. Command REJOINLEADERELECTION is received twice concurrently for the same core.
      This scenario is much easier to reproduce with an external client. It occurs for us since we have customizations using this command.
       

      Full analysis

      There are at least two issues in the current code.

      1. We blindly delete ZK nodes that were created by other threads

      Right after we created our ephemeral sequential ZK node to join the election queue, we check whether there are other ZK nodes for the same session ID (so the same Solr server). When some other nodes are found, we just deleted them but we don't stop the election for any of the thread. It is likely the two threads will think they won the election.

      In addition, if two threads join the election concurrently, it is possible they both delete the sequential node of the other thread. At the end, no node remain in the queue. So if another node joins the election later, it will miss that there may be already a leader.

      The fix for this issue would be to have one of the two threads that aborts the election, without deleting the node of the other thread.
      The election process should be continued only by the thread with the smallest sequence number in the queue.

      2. Mutability around LeaderElector and contexts

      Another issue is any thread can change the context of LeaderElector instances. This can be done either by invoking setup() (mostly after ZK session expiration) or retryElection().
      When we change the context, the old one is closed, by we don't take into account what is the exact state of the election if another thread is currently joining with the old context.
      Not sure exactly what would be the fix for this.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pierre.salagnac Pierre Salagnac
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: