Details
-
Bug
-
Status: Triage Needed
-
P3
-
Resolution: Fixed
-
2.27.0
-
None
Description
In version > 2.27, introduced by this PR: https://github.com/apache/beam/pull/13302/files#diff-33b0b6b112036df96f341aa83b88efba9215ec14dfabc9db9e9ffe66a23154a2R55
The parquetio module parses the pyarrow version like this:
ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.'))
(see https://github.com/apache/beam/blob/v2.27.0/sdks/python/apache_beam/io/parquetio.py#L55)
This does not support all PEP-440 compliant versions: https://peps.python.org/pep-0440/
For example, if pyarrow were to have a version like this: 1.0.0+abc.7, then this module would fail:
Traceback (most recent call last): File "/usr/local/lib/python3.7/runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/usr/local/lib/python3.7/runpy.py", line 109, in _get_module_details __import__(pkg_name) File "/usr/local/lib/python3.7/site-packages/apache_beam/__init__.py", line 93, in <module> from apache_beam import io File "/usr/local/lib/python3.7/site-packages/apache_beam/io/__init__.py", line 28, in <module> from apache_beam.io.parquetio import * File "/usr/local/lib/python3.7/site-packages/apache_beam/io/parquetio.py", line 53, in <module> ARROW_MAJOR_VERSION, _, _ = map(int, pa.__version__.split('.')) ValueError: invalid literal for int() with base 10: '0+abc.7'
In practice, this would fail when somebody forks pyarrow, like yours truly.
We can fix this by using pkg_resourses.parse_version which is PEP-440 compliant starting setuptools 6.0.
If maintainers agree with this change I would be wiling to submit a PR.
Attachments
Issue Links
- links to