Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-14365

[SpannerIO.readChangeStream] The reported backlog size should not be negative

Details

    • Bug
    • Status: Resolved
    • P1
    • Resolution: Fixed
    • 2.39.0
    • 2.39.0
    • io-java-gcp
    • None

    Description

      We got the following error when testing the Spanner change streams connector: 

      "java.lang.IllegalArgumentException: Expected size >= 0 but received -36590.86666666666. at org.apache.beam.sdk.transforms.reflect.ByteBuddyDoFnInvokerFactory$DefaultGetSize.validateSize(ByteBuddyDoFnInvokerFactory.java:438) at org.apache.beam.sdk.io.gcp.spanner.changestreams.dofn.ReadChangeStreamPartitionDoFn$DoFnInvoker.invokeGetSize(Unknown Source) at org.apache.beam.fn.harness.FnApiDoFnRunner.calculateRestrictionSize(FnApiDoFnRunner.java:1182) at org.apache.beam.fn.harness.FnApiDoFnRunner.trySplitForElementAndRestriction(FnApiDoFnRunner.java:1625) at org.apache.beam.fn.harness.FnApiDoFnRunner.access$1800(FnApiDoFnRunner.java:142) at org.apache.beam.fn.harness.FnApiDoFnRunner$SplittableFnDataReceiver.trySplit(FnApiDoFnRunner.java:1113) at org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$SplittingMetricTrackingFnDataReceiver.trySplit(PCollectionConsumerRegistry.java:342) at org.apache.beam.fn.harness.BeamFnDataReadRunner.trySplit(BeamFnDataReadRunner.java:259) at org.apache.beam.fn.harness.control.ProcessBundleHandler.trySplit(ProcessBundleHandler.java:688) at org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:151) at org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:116) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833) " 

      This indicates that the return value of getSize() in SDF is negative. Since the fields in Progress cannot be negative, the negative number should come from the local throughput estimator. 

       

      When this error occurs, the pipeline gets stuck and no progress is made. This prevents users from enabling autoscaling when using the SpannerIO change streams functionality.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              hengfeng Hengfeng Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m