Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7423

Spark unbounded source advances watermarks prematurely

Details

    • Bug
    • Status: Open
    • P3
    • Resolution: Unresolved
    • None
    • None
    • runner-spark
    • None

    Description

      SparkUnboundedSource will advance the watermark to the MAX of the watermark of any partition. You can see it at https://github.com/apache/beam/blob/fab12c772d461fc8db4b3c361d38fe2781926fff/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java#L204 .

      This should be the MIN - this is a combining of watermarks - not advancing. This currently means the watermark moves too quickly and the slowest partition of an unbounded source has elements that are routinely marked late.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mikekap Mike Kaplinskiy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: