Details
-
Sub-task
-
Status: Resolved
-
P2
-
Resolution: Fixed
-
None
-
None
Description
I'm running Python SDK agianst GCP in Python 3.5 and got following gcsio error while deleting files:
File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", line 1077, in <genexpr> window.TimestampedValue(v, timestamp.MAX_TIMESTAMP) for v in outputs) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 315, in finalize_write num_threads) File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/util.py", line 145, in run_using_threadpool return pool.map(fn_to_execute, inputs) File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 644, in get raise self._value File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/usr/local/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 299, in _rename_batch FileSystems.rename(source_files, destination_files) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 252, in rename return filesystem.rename(source_file_names, destination_file_names) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsfilesystem.py", line 229, in rename copy_statuses = gcsio.GcsIO().copy_batch(batch) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/gcp/gcsio.py", line 322, in copy_batch api_calls = batch_request.Execute(self.client._http) # pylint: disable=protected-access File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", line 222, in Execute batch_http_request.Execute(http) File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", line 480, in Execute self._Execute(http) File "/usr/local/lib/python3.5/site-packages/apitools/base/py/batch.py", line 450, in _Execute mime_response = parser.parsestr(header + response.content) TypeError: Can't convert 'bytes' object to str implicitly
After looking into related code in apitools library, I found response.content that's returned via http request to gcs is bytes and apitools didn't handle this scenario. This can be a blocker to any pipeline depending on gcsio and apparently blocks all Dataflow job in Python 3.
This could be another case that moving off apitools dependency in BEAM-4850.