Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15060

[R] open_dataset() on csv files lacks support for compressed files

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • R
    • None

    Description

      Using open_dataset() on S3 buckets of csv files is a game-changing magic, particularly with all the additional support for database / dplyr operations over the remote connection, and the widespread adoption of S3 buckets even by old-school big data providers like NOAA.

       

      It's not uncommon to encounter buckets with *.csv.gz formats.  I know technically this should be unnecessary, as compression can be done "in flight" by the server, but usually this is not an issue for R users since R's `connection` class automatically detects and gunzips compressed files (over either POSIX or HTTP connections).  It would be really great if arrow could handle this case too. 

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              cboettig Carl Boettiger

              Dates

                Created:
                Updated:

                Slack

                  Issue deployment