Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-14165

Specify GCS Object Version in apache_beam.io.gcp.gcsio

Details

    • Improvement
    • Status: Open
    • P2
    • Resolution: Unresolved
    • 2.37.0
    • None
    • io-py-gcp
    • None

    Description

      I would like to specify a generation when accessing a gcs object via the beam filesystem.
      Via the cli with the gsutil command a specific version can be access by the following syntax.

      gsutil cp gs://{bucket}/{object_path}#{generation} .
      

      So the corresponding python code would look something like this

      with apache_beam.io.filesystems.open("gs://{bucket}/{object_path}#{generation}") as f:
      pass
      

      Fortunately, the StorageObjectsGetRequest can already be passed a generation.
      However, this is not done within the GcsDownloader.

      I think when parsing the GCS path the generation should be extracted as well.

      Attachments

        Activity

          People

            Unassigned Unassigned
            l_karls Lasse Karls
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 2h
                2h
                Remaining:
                Remaining Estimate - 2h
                2h
                Logged:
                Time Spent - Not Specified
                Not Specified