Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-7589

Kinesis IO.write throws LimitExceededException

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.11.0
    • Fix Version/s: 2.15.0
    • Component/s: io-java-kinesis
    • Labels:
      None

      Description

      Follow up from https://issues.apache.org/jira/browse/BEAM-7357:

       


      Brachi Packter added a comment - 13/Jun/19 09:05
      Alexey Romanenko I think I find what makes the shard map update now.

      You create a producer per bundle (in SetUp function) and if I multiply it by the number of workers, this gives huge amount of producers, I belive this make the "update shard map" call.

      If I copy your code and create one producer ** for every wroker, then this error disappear.

      Can you just remove the producer creation from setUp method, and move it to some static field in the class, that created once the class is initiated.

      See similar issue that was with JDBCIO, connection pool was created per setup method, and we moved it to be a static member, and then we will have one pool for JVM. ask Ismaël Mejía for more detail.


      Alexey Romanenko added a comment  14/Jun/19 14:31  edited
       
      Brachi Packter What kind of error do you have in this case? Could you post an error stacktrace / exception message? 
      Also, it would be helpful (if it's possible) if you could provide more details about your environment and pipeline, like what is your pipeline topology, which runner do you use, number of workers in your cluster, etc. 
      For now, I can't reproduce it on my side, so all additional info will be helpful.


      Brachi Packter added a comment - 16/Jun/19 06:44
      I get same Same error:

      [0x00001728][0x00007f13ed4c4700] [error] [shard_map.cc:150] Shard map update for stream "**" failed. Code: LimitExceededException Message: Rate exceeded for stream poc-test under account **.; retrying in 5062 ms
      

      I'm not seeing full stack trace, but can see in log also this:

      [2019-06-13 08:29:09.427018] [0x000007e1][0x00007f8d508d3700] [warning] [AWS Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
      

      More details:
      I'm using DataFlow runner, java SDK 2.11.

      60 workers initally, (with auto scalling and also with flag "enableStreamingEngine")

      Normally, I'm producing 4-5k per second, but when I have latency, this can be even multiply by 3-4 times.

      When I'm starting the DataFlow job I have latency, so I produce more data, and I fail immediately.

      Also, I have consumers, 3rd party tool, I know that they call describe stream each 30 seconds.

      My job pipeline, running on GCP, reading data from PubSub, it read around 20,000 record per second (in regular time, and in latency time even 100,000 records per second) , it does many aggregation and counting base on some diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and wrting the result of aggregations to Kinesis stream.

      My stream has 10 shards, and my partition key logic is generating UUid per each record: 

      UUID.randomUUID().toString()

      Hope this gave you some more context on my problem.

      Another suggestion I have, can you try fix the issue as I suggest and provide me some specific version for testing? without merging it to master? (I would di it myself, but I had truobles building locally the hue repository of apache beam..)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                aromanenko Alexey Romanenko
                Reporter:
                kedin Anton Kedin
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 20m
                  5h 20m