Description
The parameter
bundle_size_in_elements
in SyntheticSources in Python in specific situations becomes `float` instead of `int` what causes failure on Dataflow:
Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work work_executor.execute() File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 198, in execute self._split_task) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 206, in _perform_source_split_considering_api_limits desired_bundle_size) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 243, in _perform_source_split for split in source.split(desired_bundle_size): File "/usr/local/lib/python2.7/dist-packages/apache_beam/testing/synthetic_pipeline.py", line 222, in split bundle_size_in_elements): TypeError: range() integer step argument expected, got float.
Debugging showed that on Dataflow following line causes this problem (line 213-214):
max(1, self._num_records / self._initial_splitting_num_bundles)
.
In line 218, there is:
math.floor(math.sqrt(self._num_records))
which also returns float.
In 222 line bundle_size_in_elements is used to range method which requires int.
Attachments
Issue Links
- links to