Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-2791

Quorum doesn't recover after zxid rollover

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.6, 3.4.8
    • None
    • leaderElection, quorum
    • None
    • Ubuntu 14.04.4 LTS, AWS EC2, 5 node ensembles

    Description

      When zxid rolls over the ensemble is unable to recover without manually restarting the cluster. The leader enters shutdown() state when zxid rolls over, but the remaining four nodes in the ensemble are not able to re-elect a new leader. This state has persisted for at least 15 minutes before an operator manually restarted the cluster and the ensemble recovered.

      Config:
      --------
      tickTime=2000
      initLimit=10
      syncLimit=5
      dataDir=/raid0/zookeeper
      clientPort=2181
      maxClientCnxns=100
      autopurge.snapRetainCount=14
      autopurge.purgeInterval=24
      leaderServes: True
      server.7=172.26.134.88:2888:3888
      server.6=172.26.136.143:2888:3888
      server.5=172.26.135.103:2888:3888
      server.4=172.26.134.16:2888:3888
      server.9=172.26.135.19:2888:3888

      Logs:

      https://gist.github.com/mheffner/d615d358d4a360ae56a0d0a280040640

      Attachments

        Issue Links

          Activity

            People

              abrahamfine Abraham Fine
              mheffner Mike Heffner
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: