Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7382

Bigquery IO: schema autodetection failing

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: sdk-py-core
    • Labels:
      None

      Description

      I am working on writing it tests for bigquery io on the dataflowrunner.
      When testing the schema auto detection I get:

      ERROR: test_big_query_write_schema_autodetect (apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests)*12:41:01* ----------------------------------------------------------------------*12:41:01* Traceback (most recent call last):*12:41:01*   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/io/gcp/bigquery_write_it_test.py", line 156, in test_big_query_write_schema_autodetect*12:41:01*     write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))*12:41:01*   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 426, in __exit__*12:41:01*     self.run().wait_until_finish()*12:41:01*   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/pipeline.py", line 419, in run*12:41:01*     return self.runner.run_pipeline(self, self._options)*12:41:01*   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", line 64, in run_pipeline*12:41:01*     self.result.wait_until_finish(duration=wait_duration)*12:41:01*   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 1322, in wait_until_finish*12:41:01*     (self.state, getattr(self._runner, 'last_error_msg', None)), self)*12:41:01* apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:*12:41:01* Workflow failed. Causes: S01:create/Read+write/WriteToBigQuery/NativeWrite failed., BigQuery import job "dataflow_job_18059625072014532771-B" failed., BigQuery job "dataflow_job_18059625072014532771-B" in project "apache-beam-testing" finished with error(s): errorResult: No schema specified on job or table., error: No schema specified on job or table.
      

      test code:

      input_data = [
          {'number': 1, 'str': 'abc'},
          {'number': 2, 'str': 'def'},
      ]
      
      with beam.Pipeline(argv=args) as p:
        (p | 'create' >> beam.Create(input_data)
         | 'write' >> beam.io.WriteToBigQuery(
             output_table,
             schema=beam.io.gcp.bigquery.SCHEMA_AUTODETECT,
             create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
             write_disposition=beam.io.BigQueryDisposition.WRITE_EMPTY))
      

      Is there something wrong with my test or is this a bug?

      link to pr: https://github.com/apache/beam/pull/8621
      cc: Valentyn Tymofieiev 

        Attachments

          Activity

            People

            • Assignee:
              pabloem Pablo Estrada
              Reporter:
              Juta Juta Staes
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: