Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
4.0.0
Description
====================================================================== ERROR: test_in_memory_data_source (pyspark.sql.tests.test_python_datasource.PythonDataSourceTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/sql/tests/test_python_datasource.py", line 234, in test_in_memory_data_source self.assertEqual(df.rdd.getNumPartitions(), 3) File "/__w/spark/spark/python/pyspark/sql/dataframe.py", line 224, in rdd jrdd = self._jdf.javaToPython() File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__ return_value = get_return_value( File "/__w/spark/spark/python/pyspark/errors/exceptions/captured.py", line 215, in deco return f(*a, **kw) File "/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling o208.javaToPython. : org.apache.spark.SparkException: Error from python worker: Traceback (most recent call last): File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line 61, in require_minimum_pyarrow_version import pyarrow ModuleNotFoundError: No module named 'pyarrow' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/pypy/pypy3.8/lib/pypy3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 36, in <module> File "/usr/local/pypy/pypy3.8/lib/pypy3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1023, in _gcd_import File "<frozen importlib._bootstrap>", line 1000, in _find_and_load File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 664, in _load_unlocked File "<frozen importlib._bootstrap>", line 627, in _load_backward_compatible File "<builtin>/frozen zipimport", line 259, in load_module File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/worker/plan_data_source_read.py", line 33, in <module> from pyspark.sql.connect.conversion import ArrowTableToRowsConversion, LocalDataToArrowConversion File "<builtin>/frozen zipimport", line 259, in load_module File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/conversion.py", line 20, in <module> check_dependencies(__name__) File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/connect/utils.py", line 36, in check_dependencies require_minimum_pyarrow_version() File "/__w/spark/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line 68, in require_minimum_pyarrow_version raise PySparkImportError( pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED] PyArrow >= 4.0.0 must be installed; however, it was not found. PYTHONPATH was: /__w/spark/spark/python/lib/pyspark.zip:/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip:/__w/spark/spark/python/lib/py4j-0.10.9.7-src.zip:/__w/spark/spark/python/:
https://github.com/apache/spark/actions/runs/7557652490/job/20577472214
Attachments
Issue Links
- links to