Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-12399

Unable to load libhdfs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • 3.0.0
    • Python

    Description

      I am using pyarrow 3.0.0 with python 3.7 and hadoop 2.10.1 on windows 10 64bit. Facing this following error. 

      I am using pyspark 3.1.1. I am not able to save dataframe to hdfs. When I used pyspark 3.0.0 I was able to save dataframe hdfs.

      please help:

      import pyarrow as pa
      fs = pa.hdfs.connect(host='localhost', port=9001)
      _main_:1: DeprecationWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 219, in connect
      extra_conf=extra_conf
      File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 229, in _connect
      extra_conf=extra_conf)
      File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 45, in _init_
      self._connect(host, port, user, kerb_ticket, extra_conf)
      File "pyarrow\io-hdfs.pxi", line 75, in pyarrow.lib.HadoopFileSystem._connect
      File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status
      OSError: Unable to load libhdfs: The specified module could not be found.

       

       

      Attachments

        1. image-2021-04-15-20-04-50-069.png
          38 kB
          Sukesh Pabolu

        Activity

          People

            Unassigned Unassigned
            sukeshpabolu Sukesh Pabolu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: