Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46064

EliminateEventTimeWatermark does not consider the fact that isStreaming flag can change for current child during resolution

    XMLWordPrintableJSON

Details

    Description

      Looks like this is a long standing bug.

      The object `EliminateEventTimeWatermark` is implemented as a rule, but it is not registered in analyzer/optimizer. Instead, it is called directly when withWatermark method is called, which means the rule is applied immediately against the child, regardless whether child is resolved or not.

      It is not an issue for the usage of pure DataFrame API because streaming sources have the flag isStreaming set to true even it is yet resolved, but mix-up of SQL and DataFrame API would expose the issue; we may not know the exact value of isStreaming flag on unresolved node and it is subject to change upon resolution.

      We should register EliminateEventTimeWatermark as a rule on analysis (or pre-optimization) instead, and do not apply the elimination if the child is not yet resolved.

      Attachments

        Issue Links

          Activity

            People

              kabhwan Jungtaek Lim
              kabhwan Jungtaek Lim
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: