Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-6831

python sdk WriteToBigQuery excessive usage of metered API

Details

    Description

      Right now, there is a potential issue with the python sdk where beam.io.gcp.bigquery.WriteToBigQuery calls the following api more often than needed:

      https://www.googleapis.com/bigquery/v2/projects/<project-name>/datasets/<dataset-name>/tables/<table-name>?alt=json

      The above request falls under specific bigquery API quotas which are excluded from bigquery streaming inserts. When used in a streaming pipeline, we hit this quota pretty quickly, and cannot proceed to write any further data to bigquery.

      Dispositions being used are:

      • create_disposition: beam.io.BigQueryDisposition.CREATE_NEVER
      • write_disposition: beam.io.BigQueryDisposition.WRITE_APPEND

      This is currently blocking us from using bigqueryIO in a streaming pipeline to write to bigquery, and required us to formally request an API quota increase from Google to temporarily correct the situation.

      Our pipeline uses DataflowRunner. Error seen is below, and in attached screenshot of stackdriver trace.

        "errors": [
          {
            "message": "Exceeded rate limits: too many api requests per user per method for this user_method. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors",
            "domain": "usageLimits",
            "reason": "rateLimitExceeded"
          }
        ],
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            pesach Pesach Weinstock
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: