Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
2.4.5
-
None
-
None
Description
The Spark documentation (https://spark.apache.org/docs/latest/) has this text:
Spark runs on Java 8, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.5 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).
Which suggests that Spark is compatible with Python 3.8. This is not true. For example in the latest ubuntu:18.04 docker image:
apt-get update
apt-get install python3.8 python3-pip
pip3 install pyspark
python3.8 -m pip install pyspark
python3.8 -c 'import pyspark'
Outputs:
Traceback (most recent call last): File "<string>", line 1, in <module> File "/usr/local/lib/python3.8/dist-packages/pyspark/__init__.py", line 51, in <module> from pyspark.context import SparkContext File "/usr/local/lib/python3.8/dist-packages/pyspark/context.py", line 31, in <module> from pyspark import accumulators File "/usr/local/lib/python3.8/dist-packages/pyspark/accumulators.py", line 97, in <module> from pyspark.serializers import read_int, PickleSerializer File "/usr/local/lib/python3.8/dist-packages/pyspark/serializers.py", line 72, in <module> from pyspark import cloudpickle File "/usr/local/lib/python3.8/dist-packages/pyspark/cloudpickle.py", line 145, in <module> _cell_set_template_code = _make_cell_set_template_code() File "/usr/local/lib/python3.8/dist-packages/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code return types.CodeType( TypeError: an integer is required (got type bytes)
I propose the documentation is updated to say "Python 3.4 to 3.7". I also propose the `setup.py` file for pyspark include:
python_requires=">=3.6,<3.8",