Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7439

Bigquery Write with schema None: TypeError: 'NoneType' object has no attribute '__getitem__'

Details

    • Bug
    • Status: Resolved
    • P0
    • Resolution: Fixed
    • None
    • 2.13.0
    • sdk-py-core
    • None

    Description

      When running a simple write to bigquery on apache-beam==2.12.0

      input_data = [
         {'str': 'test'}
       ]
      (pipeline | 'create' >> beam.Create(input_data)
         | 'write' >> beam.io.WriteToBigQuery(
         '<project-id>:beam_test.test'))
      

       

      I get the following error:

      WARNING:root:Start running in the cloud
      Traceback (most recent call last):
       File "test_pipeline.py", line 193, in <module>
       main()
       File "test_pipeline.py", line 183, in main
       '<project-id>:beam_test.test'))
       File "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/pvalue.py", line 112, in __or__
       return self.pipeline.apply(ptransform, self)
       File "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 470, in apply
       label or transform.label)
       File "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 480, in apply
       return self.apply(transform, pvalueish)
       File "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/pipeline.py", line 516, in apply
       pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
       File "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/runners/runner.py", line 193, in apply
       return m(transform, input, options)
       File "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 617, in apply_WriteToBigQuery
       parse_table_schema_from_json(json.dumps(transform.schema)),
       File "/mnt/c/Users/Juta/Documents/02-projects/apache/beam/sdks/venv2/local/lib/python2.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 130, in parse_table_schema_from_json
       fields = [_parse_schema_field(f) for f in json_schema['fields']]
      TypeError: 'NoneType' object has no attribute '__getitem__'

      I already proposed a fix for this as part of a larger pr: https://github.com/apache/beam/pull/8621/commits/41cdfbda5a4e2a56b6d10046ba265ad68c78675d

      I was wondering if this also needs to be patched for version 2.12.0?

      cc: tvalentyn pabloem

      Attachments

        Issue Links

          Activity

            People

              pabloem Pablo Estrada
              Juta Juta Staes
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h