Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7709

Flattening multiple outputs of a ParDoN fails

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • Not applicable
    • Not applicable
    • sdk-go
    • None

    Description

      If a user does a beam.ParDoN for pardo > 2  and then passes one or more of the outputs to a flatten, then if the flatten occurs SDK side, it currently creates multiple flatten nodes, which then triggers the downstream pardo (the DoFn that consumes the Flatten's output) to be initialized multiple times for a single bundle.

      The fix is to pre-emptively populate the input links with the first created flatten, so subsequent tracings of the plan use the same flatten node the same way the Go direct runner does[1]. That would happen in the exec translate code.

      [1] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/direct/direct.go#L299

      [2] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/translate.go#L493

      Attachments

        Issue Links

          Activity

            People

              lostluck Robert Burke
              lostluck Robert Burke
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1.5h
                  1.5h