Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13250

GCSFileSystem is not facilitating injection of the gcs url

Details

    • Improvement
    • Status: Open
    • P2
    • Resolution: Unresolved
    • 2.33.0
    • Not applicable
    • io-py-gcp

    Description

      Inside `apache_beam.io.gcp.gcsfilesystem.GCSFileSystem` is directly instantiating for every operations an instance of `apache_beam.io.gcp.gcsio.GcsIO` without allowing anyone to make use of the flexibility of instantiating with a custom client storage.

      As you can see inside `GcsIO` there's an optional parameter called `storage_client` and that would give us the possibility to overwrite the storage client by example depending of the run level and potentially make use of a Gcs emulator locally when testing the infrastructure.

      Right now the only way we have to overwrite the gcs url is by monkey patching the default storage which is not most clean way of doing it.

       

      Potential solution:

      If `GCSFileSystem` was transformed by example by adding a single class method to generate every one of thse `GcsIO`, that would give us the possibility to create a custom `GCSFileSystem` class, inheriting from it, without duplicating all of it's code and to inject our desired url depending of the run level.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mlhamel Mathieu Leduc-Hamel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 4h 10m
                4h 10m