Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-5417

FileSystems.match behaviour diff between GCS and local file system

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.5.0, 2.6.0
    • 2.8.0
    • sdk-py-core

    Description

      Given the directory structure:

       

      .
      ├── filesystem-match-test
      │   ├── a
      │   │   └── file.txt
      │   └── b
      │       └── file.txt
      └── filesystem-match-test.py
      

       

      Where filesystem-match-test.py contains:

      from __future__ import print_function
      import os
      import posixpath
      
      from apache_beam.io.filesystem import MatchResult
      from apache_beam.io.filesystems import FileSystems
      
      BASES = [
          os.path.join(os.path.dirname(__file__), "./"),
          "gs://my-bucket/test/",
      ]
      pattern = "filesystem-match-test/*/file.txt"
      for base_path in BASES:
          full_pattern = posixpath.join(base_path, pattern)
          print("full_pattern: {}".format(full_pattern))
          match_result = FileSystems.match([full_pattern])[0]  # type: MatchResult
          print("metadata list: {}".format(match_result.metadata_list))
      

      Running python filesystem-match-test.py does not match any files locally, but does match files on GCS:

      full_pattern: ./filesystem-match-test/*/file.txt
      metadata list: []
      full_pattern: gs://my-bucket/test/filesystem-match-test/*/file.txt
      metadata list: [FileMetadata(gs://my-bucket/test/filesystem-match-test/a/file.txt, 6), FileMetadata(gs://my-bucket/test/filesystem-match-test/b/file.txt, 6)]
      

      The expected result is that a/file.txt and b/file.txt should be matched for both patterns.

      Attachments

        Activity

          People

            udim Udi Meiri
            joar Joar Wandborg
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 7h
                7h