Uploaded image for project: 'Ambari'
  1. Ambari
  2. AMBARI-24201

Command reschedule does not work causing blueprint deployments to timeout

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.7.0
    • None
    • None

    Description

      During stage timeout/failure of devilery during blueprint install server
      usually reschedules running command. By sending cancel command along with
      repeated execution command.

      The bug is that agent cancels the command which needs to be newly scheduled.

      2018-06-27 01:34:58,105 WARN [agent-message-retry-0] MessageEmitter:255 - Reschedule execution command emitting, retry: 1, messageId: 19

      ..., u'cancelCommands': [

      {u'commandType': u'CANCEL_COMMAND', u'target_task_id': 145, u'reason': u'Stage timeout'}

      ]}}, u'requiredConfigTimestamp': 1530060845474}
      INFO 2018-06-27 01:34:58,121 ActionQueue.py:115 - Canceling command with taskId = 145
      INFO 2018-06-27 01:34:58,121 ActionQueue.py:134 - Canceling EXECUTION_COMMAND for service ZOOKEEPER and role ZOOKEEPER_CLIENT with taskId 145
      WARNING 2018-06-27 01:34:58,121 CustomServiceOrchestrator.py:129 - Unable to find process associated with taskId = 145
      INFO 2018-06-27 01:34:58,122 ActionQueue.py:103 - Adding EXECUTION_COMMAND for role ZOOKEEPER_CLIENT for service ZOOKEEPER of cluster_id 2 to the queue.
      INFO 2018-06-27 01:34:58,122 security.py:135 - Event to server at /reports/responses (correlation_id=870):

      {'status': 'OK', 'messageId': '19'}

      INFO 2018-06-27 01:34:58,142 _init_.py:57 - Event from server at /user/ (correlation_id=870):

      {u'status': u'OK'}

      INFO 2018-06-27 01:34:59,293 ActionQueue.py:238 - Executing command with id = 10-0, taskId = 145 for role = ZOOKEEPER_CLIENT of cluster_id 2.
      INFO 2018-06-27 01:34:59,294 security.py:135 - Event to server at /reports/commands_status (correlation_id=871): {'clusters': {u'2': [

      {'status': 'IN_PROGRESS', 'taskId': 145, 'tmpout': '/var/lib/ambari-agent/data/output-145.txt', 'roleCommand': u'INSTALL', 'structuredOut': '/var/lib/ambari-agent/data/structured-out-145.json', 'clusterId': u'2', 'serviceName': u'ZOOKEEPER', 'role': u'ZOOKEEPER_CLIENT', 'actionId': u'10-0', 'tmperr': '/var/lib/ambari-agent/data/errors-145.txt'}

      ]}}
      INFO 2018-06-27 01:34:59,295 ActionQueue.py:279 - Command execution metadata - taskId = 145, retry enabled = True, max retry duration (sec) = 1200, log_output = True
      INFO 2018-06-27 01:34:59,296 ActionQueue.py:285 - Command with taskId = 145 canceled
      ERROR 2018-06-27 01:34:59,296 ActionQueue.py:221 - Exception while processing EXECUTION_COMMAND command
      Traceback (most recent call last):
      File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 214, in process_command
      self.execute_command(command)
      File "/usr/lib/ambari-agent/lib/ambari_agent/ActionQueue.py", line 354, in execute_command
      commandresult['stdout'] += '\n\nCommand completed successfully!\n' if status == self.COMPLETED_STATUS else '\n\nCommand failed after ' + str(numAttempts) + ' tries\n'
      UnboundLocalError: local variable 'commandresult' referenced before assignment

      Attachments

        1. AMBARI-24201.patch
          0.9 kB
          Andrew Onischuk
        2. AMBARI-24201.patch
          0.9 kB
          Andrew Onischuk
        3. AMBARI-24201.patch
          0.9 kB
          Andrew Onischuk

        Activity

          People

            aonishuk Andrew Onischuk
            aonishuk Andrew Onischuk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: