Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-6115

SyntheticSource bundle size parameter sometimes is casted to invalid type

Details

    • Bug
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • None
    • 2.9.0
    • testing
    • None

    Description

      The parameter

      bundle_size_in_elements

      in SyntheticSources in Python in specific situations becomes `float` instead of `int` what causes failure on Dataflow:

      Traceback (most recent call last):
      File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work
      work_executor.execute()
      File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 198, in execute
      self._split_task)
      File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 206, in _perform_source_split_considering_api_limits
      desired_bundle_size)
      File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 243, in _perform_source_split
      for split in source.split(desired_bundle_size):
      File "/usr/local/lib/python2.7/dist-packages/apache_beam/testing/synthetic_pipeline.py", line 222, in split
      bundle_size_in_elements):
      TypeError: range() integer step argument expected, got float.

       
      Debugging showed that on Dataflow following line causes this problem (line 213-214):

      max(1, self._num_records / self._initial_splitting_num_bundles)

      .
      In line 218, there is:

      math.floor(math.sqrt(self._num_records))

      which also returns float.

      In 222 line bundle_size_in_elements is used to range method which requires int.

      Attachments

        Issue Links

          Activity

            People

              kasiak Kasia Kucharczyk
              kasiak Kasia Kucharczyk
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m