Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-10395

Dataflow runner should deduplicate files to stage by destination

Details

    • Improvement
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • None
    • 2.24.0
    • runner-dataflow
    • None

    Description

      If a pipeline contains multiple files with the same destination path, the dataflow runner will try to stage them both in parallel, resulting in the upload usually failing (due to conflicting uploads).

      The runner should only upload one file per destination, and ideally check that the sources are the same as well.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              SteveNiemitz Steve Niemitz
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 50m
                  1h 50m