Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-6469

[Python] HDFS documentation does not mention HDFS short circuit readings

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Won't Do
    • None
    • None
    • Python

    Description

      Due to PyArrow using libhdfs underneath, it is expected that files read from HDFS are going to make use of short circuit readings.

      However, the PyArrow documentation does not explain whether this feature is supported (and on what situations) and if that works without any configuration.

      For instance, I'm interested in the use case in which we make use of short circuit feature to read some of the columns from a Parquet file located in HDFS into a dataframe.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            prcerioni Paulo Roberto Cerioni
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: