Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13930

Address StateSpec inconsistency between Runner and Fn API

Details

    Description

      The ability to mix and match runners and SDKs is accomplished through two portability layers:
      1. The Runner API provides an SDK-and-runner-independent definition of a Beam pipeline
      2. The Fn API allows a runner to invoke SDK-specific user-defined functions

      Apache Beam pipelines support executing stateful DoFns[1]. To support this execution the Runner API defines multiple user state specifications:

      • ReadModifyWriteStateSpec
      • BagStateSpec
      • OrderedListStateSpec
      • CombiningStateSpec
      • MapStateSpec
      • SetStateSpec

      The Fn API[2] defines APIs[3] to get, append and clear user state currently supporting a BagUserState and MultimapUserState protocol.

      Since there is no clear mapping between the Runner API and Fn API state specifications, there is no way for a runner to know that it supports a given API necessary to support the execution of the pipeline. The Runner will also have to manage additional runtime metadata associated with which protocol was used for a type of state so that it can successfully manage the state’s lifetime once it can be garbage collected.

      Please see the doc[4] for further details and a proposal on how to address this shortcoming.

      1: https://beam.apache.org/blog/stateful-processing/
      2: https://github.com/apache/beam/blob/3ad05523f4cdf5122fc319276fcb461f768af39d/model/fn-execution/src/main/proto/beam_fn_api.proto#L742
      3: https://s.apache.org/beam-fn-state-api-and-bundle-processing
      4: http://doc/1ELKTuRTV3C5jt_YoBBwPdsPa5eoXCCOSKQ3GPzZrK7Q

      Attachments

        Activity

          People

            lcwik Luke Cwik
            lcwik Luke Cwik
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 50m
                2h 50m