Avoid latency when polling for activity tasks - AWS Step Functions

Avoid latency when polling for activity tasks

The GetActivityTask API is designed to provide a taskToken exactly once. If a taskToken is dropped while communicating with an activity worker, a number of GetActivityTask requests can be blocked for 60 seconds waiting for a response until GetActivityTask times out.

If you only have a small number of polls waiting for a response, it's possible that all requests will queue up behind the blocked request and stop. However, if you have a large number of outstanding polls for each activity Amazon Resource Name (ARN), and some percentage of your requests are stuck waiting, there will be many more that can still get a taskToken and begin to process work.

For production systems, we recommend at least 100 open polls per activity ARN's at each point in time. If one poll gets blocked, and a portion of those polls queue up behind it, there are still many more requests that will receive a taskToken to process work while the GetActivityTask request is blocked.

To avoid these kinds of latency problems when polling for tasks:

  • Implement your pollers as separate threads from the work in your activity worker implementation.

  • Have at least 100 open polls per activity ARN at each point in time.


    Scaling to 100 open polls per ARN can be expensive. For example, 100 Lambda functions polling per ARN is 100 times more expensive than having a single Lambda function with 100 polling threads. To both reduce latency and minimize cost, use a language that has asynchronous I/O, and implement multiple polling threads per worker. For an example activity worker where the poller threads are separate from the work threads, see Example Activity Worker in Ruby.

For more information on activities and activity workers see Activities.