Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7173

Bigquery connector should not enable schema autodetection without a user explicitly instructing to do so.

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.13.0
    • io-py-gcp
    • None

    Description

      Currently BQ_FILE_LOADS insertion method enables schema autodetection: https://github.com/apache/beam/blob/6567f1687d53e491b337ba94f521fa2e4af35e46/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L340

       It may be more user-friendly allow users to opt-in for schema autodetection in their pipelines across all use-cases for BQ connector. Schema autodetection is an approximation, and does not always work.

      For example, schema autodetection cannot infer whether a string data is binary bytes or textual string, and will always prefer the latter. If schema autodetection is enabled by default, users who need to write 'bytes' data will always have to specify a schema, even when writing to a table that was already created and has the schema. Otherwise autodetected schema will try to write 'string' entry into a 'bytes' field and the write will fail.

      Related discussion: https://lists.apache.org/thread.html/1f9d9cb1bbbfca87d74e62ba8e58a15059ed6c20ab419002fcd3f8df@%3Cdev.beam.apache.org%3E

       

      Attachments

        Issue Links

          Activity

            People

              pabloem Pablo Estrada
              tvalentyn Valentyn Tymofieiev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3.5h
                  3.5h