Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7025

Python pipelines should not be able to use output tags that are not defined in with_outputs.

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: sdk-py-core
    • Labels:
      None

      Description

      This is an indication of a user misconfiguring a beam pipeline.

      This is because its not possible to get a handle to use the produced pcollection for that output tag, if .with_outputs is not used. So this should be disallowed entirely, a run time exception should be thrown.

      Note:
      The bundle descriptor knows which tags are available for each step. So at runtime it can be detected. But we need to be careful to not test it on every element, for performance purposes
       
      i suspect its possible to detect it statically, but may require collecting more information
       
      But there should be some code path already collects the elements for the bundle into the different tags when output at that point, at the end of bundle execution we can check for it which would be cheap

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              ajamato@google.com Alex Amato
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated: