Details
-
Improvement
-
Status: Open
-
P3
-
Resolution: Unresolved
-
None
-
None
-
None
Description
When Dataflow Runner is sending a job for remote execution, such requests in rare cases might fail with retriable errors. Dataflow Runner could recognize a class of retriable errors and attempt to resubmit the job again when such errors are encountered. Sample retriable error encountered by Beam Java SDK:
```
java.lang.RuntimeException: Failed to create a workflow job: The operation was cancelled.
11:32:14 at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:869)
11:32:14 at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:178)
11:32:14 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
11:32:14 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
...
11:32:14 Caused by:
com.google.api.client.googleapis.json.GoogleJsonResponseException: 499 Client Closed Request
11:32:14 {
11:32:14 "code" : 499,
11:32:14 "errors" : [
],
11:32:14 "message" : "The operation was cancelled.",
11:32:14 "status" : "CANCELLED"
11:32:14 }
11:32:14 at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
11:32:14 at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
11:32:14 at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
11:32:14 at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
11:32:14 at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1067)
11:32:14 at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
11:32:14 at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
11:32:14 at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
11:32:14 at org.apache.beam.runners.dataflow.DataflowClient.createJob(DataflowClient.java:61)
11:32:14 at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:855)
11:32:14 ... 41 more'
```