Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.4.4, 3.0.0
-
None
Description
SPARK-24156 brought the ability to run a batch without actual data to enable fast state cleanup as well as emit evicted outputs without waiting actual data to come.
This breaks some assumption on `ProgressReporter.extractStateOperatorMetrics`. See comment in source code:
// lastExecution could belong to one of the previous triggers if `!hasNewData`. // Walking the plan again should be inexpensive.
and newNumRowsUpdated is replaced to 0 if hasNewData is false. It makes sense if we copy progress from previous execution (which means no batch is run for this time), but after SPARK-24156 the precondition is broken.
Spark should still replace the value of newNumRowsUpdated with 0 if there's no batch being run and it needs to copy the old value from previous execution, but it shouldn't touch the value if it runs a batch for no data.
Attachments
Issue Links
- links to