Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-18429

Add default method for CheckpointListener.notifyCheckpointAborted(checkpointId)

    XMLWordPrintableJSON

Details

    Description

      The CheckpointListener interface is implemented by many users. Adding a new method notifyCheckpointAborted(long) to the interface without a default method breaks many user programs.

      We should turn this method into a default method:

      • Avoid breaking programs
      • It is in practice less relevant for programs to react to checkpoints being aborted then to being completed. The reason is that on completion you often want to commit side-effects, while on abortion you frequently do not do anything, but let the next successful checkpoint commit all changes up to then.

      Original Confusion

      There was confusion about this originally, going back to a comment by myself suggesting this should not be a default method, incorrectly thinking of it as an internal interface: https://github.com/apache/flink/pull/8693#issuecomment-542834147

      See also clarification email on the mailing list:

      About the "notifyCheckpointAborted()":
      
      When I wrote that comment, I was (apparently wrongly) assuming we were talking about an
      internal interface here, because the "abort" signal was originally only intended to cancel the
      async part of state backend checkpoints.
      
      I just realized that this is exposed to users - and I am actually with Thomas on this one. The
      "CheckpointListener" is a very public interface that many users implement. The fact that it is
      tagged "@PublicEvolving" is somehow not aligned with reality. So adding the method here will
      in reality break lots and lots of user programs.
      
      I think also in practice it is much less relevant for user applications to react to aborted checkpoints.
      Since the notifications there can not be relied upon (if there is a task failure concurrently) users
      always have to follow the "newer checkpoint subsumes older checkpoint" contract, so the abort
      method is probably rarely relevant.
      
      This is something we should change, in my opinion.
      

      Attachments

        Issue Links

          Activity

            People

              sewen Stephan Ewen
              sewen Stephan Ewen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: