Uploaded image for project: 'Mesos'
  1. Mesos
  2. MESOS-5193

Recovery failed: Failed to recover registrar on reboot of mesos master

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 0.22.0, 0.27.0
    • None
    • master
    • Important

    Description

      Hi all,

      We are using a 3 node cluster with mesos master, mesos slave and zookeeper on all of them. We are using chronos on top of it. The problem is when we reboot the mesos master leader, the other nodes try to get elected as leader but fail with recovery registrar issue.
      "Recovery failed: Failed to recover registrar: Failed to perform fetch within 1mins"

      The next node then try to become the leader but again fails with same error. I am not sure about the issue. We are currently using mesos 0.22 and also tried to upgrade to mesos 0.27 as well but the problem continues to happen.

      /usr/sbin/mesos-master --work_dir=/tmp/mesos_dir --zk=zk://node1:2181,node2:2181,node3:2181/mesos --quorum=2

      Can you please help us resolve this issue as its a production system.

      Thanks,
      Priyanka

      Attachments

        1. node3.log
          37 kB
          Priyanka Gupta
        2. node3_after_work_dir.log
          36 kB
          Priyanka Gupta
        3. node2.log
          48 kB
          Priyanka Gupta
        4. node2_after_work_dir.log
          76 kB
          Priyanka Gupta
        5. node1.log
          48 kB
          Priyanka Gupta
        6. node1_after_work_dir.log
          66 kB
          Priyanka Gupta
        7. full.log
          190 kB
          Benjamin Mahler

        Activity

          People

            Unassigned Unassigned
            prigupta Priyanka Gupta
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: