Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7628

Retry createJob requests in Dataflow Runner for retriable errors.

Details

    • Improvement
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • runner-dataflow
    • None

    Description

      When Dataflow Runner is sending a job for remote execution, such requests in rare cases might fail with retriable errors. Dataflow Runner could recognize a class of retriable errors and attempt to resubmit the job again when such errors are encountered. Sample retriable error encountered by Beam Java SDK:

      ```
      java.lang.RuntimeException: Failed to create a workflow job: The operation was cancelled.
      11:32:14 at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:869)
      11:32:14 at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:178)
      11:32:14 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
      11:32:14 at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
      ...
      11:32:14 Caused by:
      com.google.api.client.googleapis.json.GoogleJsonResponseException: 499 Client Closed Request
      11:32:14 {
      11:32:14 "code" : 499,
      11:32:14 "errors" : [

      { 11:32:14 "domain" : "global", 11:32:14 "message" : "The operation was cancelled.", 11:32:14 "reason" : "backendError" 11:32:14 }

      ],
      11:32:14 "message" : "The operation was cancelled.",
      11:32:14 "status" : "CANCELLED"
      11:32:14 }
      11:32:14 at com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
      11:32:14 at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
      11:32:14 at com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
      11:32:14 at com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
      11:32:14 at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1067)
      11:32:14 at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
      11:32:14 at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
      11:32:14 at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
      11:32:14 at org.apache.beam.runners.dataflow.DataflowClient.createJob(DataflowClient.java:61)
      11:32:14 at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:855)
      11:32:14 ... 41 more'
      ```

      Attachments

        Activity

          People

            Unassigned Unassigned
            tvalentyn Valentyn Tymofieiev
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: