[BEAM-14083] ReadFromBigquery examples throw pickling exception when using InteractiveRunner - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: P2
Resolution: Fixed
Affects Version/s: 2.35.0, 2.36.0, 2.37.0
Fix Version/s: 2.38.0
Component/s: io-py-gcp, runner-py-interactive
Labels:
None
Environment:
Cloud Dataflow Workbench Notebook on GCP
Apache Beam 2.37.0 Kernel for Python 3

Description

When using a combination of the python InteractiveRunner and beam.io.ReadFromBigquery, the canonical examples from the beam python tutorials for BigQuery trigger and exception that appears to result from failing to serialize generators:

notebook.py

pipeline = beam.Pipeline(InteractiveRunner(), options=options)
max_temperatures = (
    pipeline
    | 'QueryTableStdSQL' >> beam.io.ReadFromBigQuery(
        query='SELECT max_temperature FROM '\
              '`clouddataflow-readonly.samples.weather_stations`',
        use_standard_sql=True, gcs_location=gcs_location)
    # Each row is a dictionary where the keys are the BigQuery columns
    | beam.Map(lambda elem: elem['max_temperature']))
pipeline.run()

~/apache-beam-2.37.0/lib/python3.7/site-packages/apache_beam/coders/coders.py in <lambda>(x)
    800     protocol = pickle.HIGHEST_PROTOCOL
    801     return coder_impl.CallbackCoderImpl(
--> 802         lambda x: dumps(x, protocol), pickle.loads)
    803 
    804   def as_deterministic_coder(self, step_label, error_message=None):

TypeError: can't pickle generator objects [while running '[6]: QueryTableStdSQL/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']

The interactive pipeline works as expected in version 2.34

Attachments

Issue Links

duplicates

BEAM-14112 ReadFromBigQuery cannot be used with the interactive runner

Triage Needed

Activity

People

Assignee:: Ning

Reporter:: Mark Grey

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 10/Mar/22 15:03

Updated:: 17/Mar/22 17:55

Resolved:: 17/Mar/22 17:55