Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-14616

Clarify the ordering guarantees in the "The Broadcast State Pattern"

Agile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      When talking about the order of events in The Broadcast State Pattern, the current documentation states that the downstream tasks must not assume the broadcast events to be ordered. However, this seems to be imprecise. According to the response I got from Fabian Hueske to a question I sent to the Flink user mailing list:

      The order of broadcasted inputs is not guaranteed when the operator that broadcasts its output has a parallelism > 1 because the tasks that receive the broadcasted input consume the records in "random" order from their input channels.

      In particular, when the parallelism of the broadcasting operator is 1, the order is guaranteed.

      Fabian Hueske continues with his suggestions on how to ensure the correct ordering of the broadcast events:

      So there are two approaches:
      1) make the operator that broadcasts its output run as an operator with parallelism 1 (or add a MapOperator with parallelism 1 that just forwards its input). This will cause all broadcasted records to go through the same network channel and their order is guaranteed on each receiver.
      2) use timestamps of broadcasted records for ordering and watermarks to reason about completeness.

      If the broadcasted data is (comparatively) small in volume (which is usually given because otherwise broadcasting would be expensive), I'd go with the first option.
      The second approach is more difficult to implement.

      It would be great if the ordering guarantees could be clarified to avoid confusion. This could be achieved by simply expanding the paragraph that talks about the order of events in the "important considerations" section. More ambitiously, the suggestions given by Fabian Hueske could be turned into examples.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            fniksic Filip Niksic

            Dates

              Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 1h
              1h
              Remaining:
              Remaining Estimate - 1h
              1h
              Logged:
              Time Spent - Not Specified
              Not Specified

              Slack

                Issue deployment