Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-20593

EU/RU Auto-Retry does not reschedule task when host is not heartbeating before task is scheduled and doesn't have a start time

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.5.0
    • 2.5.1
    • ambari-server
    • rolling upgrade

    Description

      STR:
      1) Install ambari 2.5.0.1
      In the ambari.properties file, set
      stack.upgrade.auto.retry.timeout.mins=6
      stack.upgrade.auto.retry.check.interval.secs=30

      2) Install HDP with any set of services
      3) Add NameNode HA
      4) Register and install new HDP stack version
      5) Start RU
      5) Corrupt one step from Core Masters group (e.g., stop ambari-agent on a node while the command is running)
      Ambari will restart Restarting NN Batch 1
      6) Fix corrupted step (e.g., start ambari-agent again)
      7) Corrupt another step from before the command is scheduled (e.g., stop ambari-agent on a node)
      8) Fix corrupted step (e.g., start ambari-agent agent)

      The expectation is that Ambari Server should schedule the command on the 2nd node. However, because the command never got an original_start_time and start_time, the RetryUpgradeActionService was not able to retry it since it didn't have any timestamps to compare against.

      Attachments

        1. AMBARI-20593.trunk.patch
          10 kB
          Alejandro Fernandez
        2. AMBARI-20593.branch-2.5.patch
          10 kB
          Alejandro Fernandez

        Issue Links

          Activity

            People

              afernandez Alejandro Fernandez
              stereshchenko Sviatoslav Tereshchenko
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: