Details
-
Bug
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
2.35.0, 2.36.0, 2.37.0
-
None
-
Cloud Dataflow Workbench Notebook on GCP
Apache Beam 2.37.0 Kernel for Python 3
Description
When using a combination of the python InteractiveRunner and beam.io.ReadFromBigquery, the canonical examples from the beam python tutorials for BigQuery trigger and exception that appears to result from failing to serialize generators:
notebook.py
pipeline = beam.Pipeline(InteractiveRunner(), options=options) max_temperatures = ( pipeline | 'QueryTableStdSQL' >> beam.io.ReadFromBigQuery( query='SELECT max_temperature FROM '\ '`clouddataflow-readonly.samples.weather_stations`', use_standard_sql=True, gcs_location=gcs_location) # Each row is a dictionary where the keys are the BigQuery columns | beam.Map(lambda elem: elem['max_temperature'])) pipeline.run()
~/apache-beam-2.37.0/lib/python3.7/site-packages/apache_beam/coders/coders.py in <lambda>(x) 800 protocol = pickle.HIGHEST_PROTOCOL 801 return coder_impl.CallbackCoderImpl( --> 802 lambda x: dumps(x, protocol), pickle.loads) 803 804 def as_deterministic_coder(self, step_label, error_message=None): TypeError: can't pickle generator objects [while running '[6]: QueryTableStdSQL/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']
The interactive pipeline works as expected in version 2.34
Attachments
Issue Links
- duplicates
-
BEAM-14112 ReadFromBigQuery cannot be used with the interactive runner
- Triage Needed