Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.9.2
-
None
Description
While using jclouds 1.9.2, while attempting to stop a VM in aws-ec2:us-east-1 I hit the error below:
Failed after 39.1s: Error invoking stop at MySqlNodeImpl{id=zrxtil48g6}: AWSResponseException: request POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', requestToken='null', code='Unavailable', message='The service is unavailable. Please try again shortly.', context='{Response=, Errors=}'} org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException: Error invoking stop at MySqlNodeImpl{id=zrxtil48g6}: AWSResponseException: request POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', requestToken='null', code='Unavailable', message='The service is unavailable. Please try again shortly.', context='{Response=, Errors=}'} at org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException.propagate(EffectorUtils.java:318) at org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException.access$100(EffectorUtils.java:285) at org.apache.brooklyn.core.mgmt.internal.EffectorUtils.handleEffectorException(EffectorUtils.java:326) at org.apache.brooklyn.core.effector.EffectorTasks$EffectorBodyTaskFactory$2.handleException(EffectorTasks.java:90) at org.apache.brooklyn.util.core.task.DynamicSequentialTask.handleException(DynamicSequentialTask.java:469) at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:417) at org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: AWSResponseException: request POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', requestToken='null', code='Unavailable', message='The service is unavailable. Please try again shortly.', context='{Response=, Errors=}'} at org.apache.brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:129) at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.doStop(MachineLifecycleEffectorTasks.java:863) at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.stop(MachineLifecycleEffectorTasks.java:743) at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopEffectorBody.call(MachineLifecycleEffectorTasks.java:308) at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopEffectorBody.call(MachineLifecycleEffectorTasks.java:305) at org.apache.brooklyn.core.effector.EffectorTasks$EffectorBodyTaskFactory$1.call(EffectorTasks.java:82) at org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359) ... 5 more Caused by: java.util.concurrent.ExecutionException: org.jclouds.aws.AWSResponseException: request POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', requestToken='null', code='Unavailable', message='The service is unavailable. Please try again shortly.', context='{Response=, Errors=}'} at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63) at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:361) at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.doStop(MachineLifecycleEffectorTasks.java:851) ... 10 more Caused by: org.jclouds.aws.AWSResponseException: request POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', requestToken='null', code='Unavailable', message='The service is unavailable. Please try again shortly.', context='{Response=, Errors=}'} at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75) at org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:67) at org.jclouds.http.internal.BaseHttpCommandExecutorService.shouldContinue(BaseHttpCommandExecutorService.java:136) at org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:105) at org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:90) at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:73) at org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:44) at org.jclouds.reflect.FunctionalReflection$FunctionalInvocationHandler.handleInvocation(FunctionalReflection.java:117) at com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87) at com.sun.proxy.$Proxy160.describeInstancesInRegion(Unknown Source) at org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getRunningInstanceInRegion(EC2GetNodeMetadataStrategy.java:64) at org.jclouds.aws.ec2.compute.strategy.AWSEC2GetNodeMetadataStrategy.getRunningInstanceInRegion(AWSEC2GetNodeMetadataStrategy.java:52) at org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getNode(EC2GetNodeMetadataStrategy.java:56) at org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:42) at org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:28) at org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:46) at org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:31) at org.jclouds.util.Predicates2$RetryablePredicate.apply(Predicates2.java:117) at org.jclouds.compute.internal.BaseComputeService.doDestroyNode(BaseComputeService.java:302) at org.jclouds.compute.internal.BaseComputeService.destroyNode(BaseComputeService.java:243) at org.apache.brooklyn.location.jclouds.JcloudsLocation.releaseNode(JcloudsLocation.java:2701) at org.apache.brooklyn.location.jclouds.JcloudsLocation.release(JcloudsLocation.java:2651) at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.stopAnyProvisionedMachines(MachineLifecycleEffectorTasks.java:1021) at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:875) at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:873) ... 6 more
Digging into the jclouds wire log, I see:
2016-09-13 21:00:09,811 DEBUG o.j.r.i.InvokeHttpMethod [brooklyn-execmanager-YxAEEeJK-3722]: >> invoking DescribeInstances 2016-09-13 21:00:09,812 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [brooklyn-execmanager-YxAEEeJK-3722]: Sending request -1247513718: POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 2016-09-13 21:00:09,812 DEBUG jclouds.wire [brooklyn-execmanager-YxAEEeJK-3722]: >> "Action=DescribeInstances&InstanceId.1=i-331bcea0&Version=2012-06-01" 2016-09-13 21:00:09,812 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: >> POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 2016-09-13 21:00:09,812 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: >> Host: ec2.us-east-1.amazonaws.com 2016-09-13 21:00:09,812 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: >> X-Amz-Date: 20160913T200009Z 2016-09-13 21:00:09,812 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: >> Authorization: AWS4-HMAC-SHA256 Credential=AAAAAAAAAAAAAAAAAAAA/20160913/us-east-1/ec2/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=84cf762b171d56299b096a106733e72bbf96d5654b4c607135cea8cd7be51689 2016-09-13 21:00:09,812 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: >> Content-Type: application/x-www-form-urlencoded 2016-09-13 21:00:09,812 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: >> Content-Length: 67 2016-09-13 21:00:10,294 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService [brooklyn-execmanager-YxAEEeJK-3722]: Receiving response -1247513718: HTTP/1.1 503 Service Unavailable 2016-09-13 21:00:10,294 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: << HTTP/1.1 503 Service Unavailable 2016-09-13 21:00:10,294 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: << Date: Tue, 13 Sep 2016 20:00:10 GMT 2016-09-13 21:00:10,294 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: << Transfer-Encoding: chunked 2016-09-13 21:00:10,294 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: << Connection: close 2016-09-13 21:00:10,294 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: << Server: AmazonEC2 2016-09-13 21:00:10,294 DEBUG jclouds.headers [brooklyn-execmanager-YxAEEeJK-3722]: << Content-Type: application/unknown 2016-09-13 21:00:10,296 DEBUG jclouds.wire [brooklyn-execmanager-YxAEEeJK-3722]: << "<?xml version="1.0" encoding="UTF-8"?>[\n]" 2016-09-13 21:00:10,296 DEBUG jclouds.wire [brooklyn-execmanager-YxAEEeJK-3722]: << "<Response><Errors><Error><Code>Unavailable</Code><Message>The service is unavailable. Please try again shortly.</Message></Error></Errors><RequestID>d4a38d1b-4f02-4161-b7ff-6cd78bdefe41</RequestID></Response>"
In the jclouds code org.jclouds.aws.config.AWSHttpApiModule.provideRetryableServerCodes, I see that "Unavailable" is not a code that for which we'll retry (we'll only retry on server errors "RequestLimitExceeded" and "InternalError".
(These codes are passed into AWSServerErrorRetryHandler.shouldRetryRequestOnError.)
Looking at http://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html, I think we should also retry on:
- ServiceUnavailable: The request has failed due to a temporary failure of the server.
- Unavailable: The server is overloaded and can't handle the request.
I also wonder for client errors whether we should retry on:
- RequestExpired (given that we retry on SignatureDoesNotMatch - will we automatically reauthenticate?).
- ConcurrentTagAccess: You can't run simultaneous commands to modify a tag for a specific resource. Allow sufficient wait time for the previous request to complete, then retry your request. For more information, see Error Retries and Exponential Backoff in AWS.