Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7574

Spark runner: Combine.perKey performance issues

Details

    • Improvement
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • 2.13.0
    • 2.15.0
    • runner-spark
    • None

    Description

      Combine.perKey on current implementation uses technique of creating an accumulator for each input key and then merge all these accumulators together. Optimize this by:

      • changing accumulator from Iterable to Map, and using addInput as much as possible
      • try to move the window explode to pre-shuffle (add window label to key for non-merging windows), measure the impact, and if the impact is substantial, implement that for at least window functions assigning to single (global) window or single window per element (tumbling windows)

      Attachments

        Issue Links

          Activity

            People

              janl Jan Lukavský
              janl Jan Lukavský
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10h
                  10h