Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8841

Add ability to perform BigQuery file loads using avro

Details

    • Improvement
    • Status: Triage Needed
    • P3
    • Resolution: Fixed
    • None
    • 2.21.0
    • io-py-gcp
    • None

    Description

      Currently, JSON format is used for file loads into BigQuery in the Python SDK. JSON has some disadvantages including size of serialized data and inability to represent NaN and infinity float values.

      BigQuery supports loading files in avro format, which can overcome these disadvantages. The Java SDK already supports loading files using avro format (BEAM-2879) so it makes sense to support it in the Python SDK as well.

      The change will be somewhere around BigQueryBatchFileLoads.

      Attachments

        Activity

          People

            cccyang Chun Yang
            cccyang Chun Yang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 9h 40m
                9h 40m