Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Using open_dataset() on S3 buckets of csv files is a game-changing magic, particularly with all the additional support for database / dplyr operations over the remote connection, and the widespread adoption of S3 buckets even by old-school big data providers like NOAA.
It's not uncommon to encounter buckets with *.csv.gz formats. I know technically this should be unnecessary, as compression can be done "in flight" by the server, but usually this is not an issue for R users since R's `connection` class automatically detects and gunzips compressed files (over either POSIX or HTTP connections). It would be really great if arrow could handle this case too.
Attachments
Issue Links
- relates to
-
ARROW-15148 [C++][Dataset] Add option to write compressed CSV
- Open