Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-3221 Model pipeline representation improvements
  3. BEAM-3227

Consider sharing Udf/SdkFunctionSpec records via pointer

Details

    • Sub-task
    • Status: Resolved
    • P2
    • Resolution: Won't Fix
    • None
    • Not applicable
    • beam-model
    • None

    Description

      Coders are stored by pointer, because they are often repeated and a common source of huge pipeline descriptions.

      We considered doing the same for all UDFs but decided not to, based on the logic that they are not as often identical and will rarely implement the equals() needed to actually share encoded versions.

      However, in the presence of generated code, it is very likely that DoFns and CombineFns are repeated, and also much more likely that they have meaningful equals(), so there could be size savings.

      None of this is terribly important for storage or transmission, but has more to do with arbitrary and small size limits that occur in some API frameworks or database column types.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kenn Kenneth Knowles
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: