Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13624

pipeline_fragment incorrectly prunes producer transform

Details

    • Bug
    • Status: Open
    • P2
    • Resolution: Unresolved
    • None
    • None
    • sdk-py-core

    Description

      Unfortunately I haven't been able to diagnose the exact issue here or come up with a minimal repro. I just have some code to reproduce in https://github.com/apache/beam/pull/16445.

      That PR adds support for value_count(bins) in the DataFrame API, which for some reason is interacting poorly with pipeline pruning in interactive Beam (rehydrating the pipeline raises an error about a PCollection's producer missing). The PR also adds a test to transform_test.py that replicate the issue, as well as a temporary mitigation in pipeline_fragment.py. I think the mitigation is effectively disabling pipeline pruning, so it likely shouldn't be merged.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              bhulette Brian Hulette
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: