Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13493

fileio calls mkdirs with basepath and not dirname(full_file_name)

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • 2.34.0, 2.35.0
    • None
    • io-py-files
    • None

    Description

      When calling apache_beam.io.fileio.WriteToFiles with a file_naming argument that adds a directory to the path, the current implementation fails to write files if a mkdirs or analogous call is needed in the underlying file storage.

      Example,

      apache_beam.io.fileio.WriteToFiles(
          path="some/base/dir", sink=..., destination=lambda x: "events",
          file_naming=lambda *x: "subdir/file.txt"
      ) 
      

      the current fileio implementation will call mkdirs with some/base/dir instead of some/base/dir/subdir.

      The bug is currently at https://github.com/apache/beam/blob/67bcf1e16e3fdf68cdea7a4b42b9c003e4b8948c/sdks/python/apache_beam/io/fileio.py#L605.

      ====

      Personally, I would recommend changing the FileSystems interface to have `open` call `mkdirs` in storages that require root parent directory creation.

      Attachments

        Activity

          People

            Unassigned Unassigned
            txomon Javier Domingo Cansino
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: