Details
-
Bug
-
Status: Resolved
-
P0
-
Resolution: Fixed
-
2.3.0
-
None
Description
The dataflow python jobs currently fail due to a missing docker image when using 2.3.0 RC1. This is not a bug in the SDK, the worker image needs to be published by google. I will be coordinating the worker image publication.
- Update to your own project and bucket.
GCS_BUCKET=my-cloud-storage-bucket
GCP_PROJECT=my-cloud-project
virtualenv env
. env/bin/activate
wget https://dist.apache.org/repos/dist/dev/beam/2.3.0/apache-beam-2.3.0-python.zip
pip install apache-beam-2.3.0-python.zip[gcp]
python m apache_beam.examples.wordcount --input gs://dataflow-samples/shakespeare/kinglear.txt --output gs://${GCS_BUCKET}/counts -runner DataflowRunner --project ${GCP_PROJECT} --temp_location gs://${GCS_BUCKET}/tmp --sdk_location apache-beam-2.3.0-python.zip
Dataflow logs contain:
E Handler for GET /v1.27/images/dataflow.gcr.io/v1beta3/python:2.3.0/json returned error: No such image: dataflow.gcr.io/v1beta3/python:2.3.0
E container start failed: ImagePullBackOff: Back-off pulling image "dataflow.gcr.io/v1beta3/python:2.3.0"