Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-18436

[Python] `FileSystem.from_uri` doesn't decode %-encoded characters in path

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 10.0.1
    • 11.0.0
    • C++, Python
    • - OS: macOS
      - `python=3.9.15:h709bd14_0_cpython` (installed from conda-forge)
      - `pyarrow=10.0.1:py39h2db5b05_1_cpu` (installed from conda-forge)

    Description

      When attempting to create a new filesystem object from a public dataset in S3, where there is a space in the bucket name, an error is raised.

       

      Here's a minimal reproducer:

      from pyarrow.fs import FileSystem
      result = FileSystem.from_uri("s3://nyc-tlc/trip data/fhvhv_tripdata_2022-06.parquet") 

      which fails with the following traceback:

       

      Traceback (most recent call last):
        File "/Users/james/projects/dask/dask/test.py", line 3, in <module>
          result = FileSystem.from_uri("s3://nyc-tlc/trip data/fhvhv_tripdata_2022-06.parquet")
        File "pyarrow/_fs.pyx", line 470, in pyarrow._fs.FileSystem.from_uri
        File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
        File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
      pyarrow.lib.ArrowInvalid: Cannot parse URI: 's3://nyc-tlc/trip data/fhvhv_tripdata_2022-06.parquet'

       

      Note that things work if I use a different dataset that doesn't have a space in the URI, or if I replace the portion of the URI that has a space with a `*` wildcard

       

      from pyarrow.fs import FileSystem
      result = FileSystem.from_uri("s3://ursa-labs-taxi-data/2009/01/data.parquet") # works
       result = FileSystem.from_uri("s3://nyc-tlc/*/fhvhv_tripdata_2022-06.parquet") # works
      

       

      The wildcard isn't necessarily equivalent to the original failing URI, but I think highlights that the space is somehow problematic.

      Attachments

        Issue Links

          Activity

            People

              apitrou Antoine Pitrou
              jrbourbeau James Bourbeau
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m