Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-12737

[SQL] add API to retrieve failed events in the collection due to query runtime error

Details

    • Improvement
    • Status: Resolved
    • P3
    • Resolution: Fixed
    • None
    • 2.35.0
    • dsl-sql
    • None

    Description

      When calling to
      ```
      collection.apply(SqlTransform.query(query))
      ```
      we had cases in our production that query causes some errors for small amount of events in the collection.
      For example:
      1. functions or UDF that produce null pointer exception when it is called it with null value
      2. casting issues in sql failed for some invalid data.
      and more.

      For sure we can fix those issues, but the point it that it can be missed when developing the query and failed only in production runtime in some rare cases.

      Current behaviour is that query is failed and records are retried forever.

      We would want an option to get failed records from the query and then we can send them to DLQ or totally skip them

      Is it possible to have something similar to what we have in BigQueryIo? `getFailedInsertsWithErr` ?

      .apply(BigQueryIO.write()
                              .withFailedInsertRetryPolicy(InsertRetryPolicy.retryTransientErrors())
                              .withExtendedErrorInfo())
                      .getFailedInsertsWithErr().apply(....);
      

      This is also mentioned in Beam style guide: https://beam.apache.org/contribute/ptransform-style-guide/#error-handling

      Attachments

        Activity

          People

            Brachi Brachi Packter
            brachi_packter Brachi Packter
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: