Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-8920

Go SDK: faster transforms/filter.Distinct with CombinePerKey

Details

    • Improvement
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • None
    • Not applicable
    • sdk-go
    • None

    Description

      The current implementation:

      1. add fixed value 1: P<T> --> P<<T, 1>>
      2. group by key: P<<T, 1>> --> GBK<T, 1>
      3. drop the value: P<distinct T>

      The new proposed implementation:
      1. ditto
      2. combine by key: P<<T, 1>> --> P<<distinct T, 1>>
      3. ditto

      CombinePerKey performs a pre-GBK ParDo, which is useful to reduce the shuffle size.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stephydx D. Yang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m