On-demand scaling example - AWS Lambda

On-demand scaling example

For an initial burst of traffic, your cumulative concurrency in a Region can reach between 500 and 3000 instances per minute, depending upon the Region. After this initial burst, functions can scale by an additional 500 instances per minute.

In this example, a Lambda receives 10,000 synchronous requests from API Gateway. The concurrency limit for the account is 10,000. The following shows four scenarios:


               application design figure 4

In each case, all of the requests arrive at the same time in the minute they are scheduled:

  1. All requests arrive immediately: 3000 requests are handled by new execution environments; 7000 are throttled.

  2. Requests arrive over 2 minutes: 3000 requests are handled by new execution environments in the first minute; the remaining 2000 are throttled. In minute 2, another 500 environments are created and the 3000 original environments are reused; 1500 are throttled.

  3. Requests arrive over 3 minutes: 3000 requests are handled by new execution environments in the first minute; the remaining 333 are throttled. In minute 2, another 500 environments are created and the 3000 original environments are reused; all requests are served. In minute 3, the remaining 3334 requests are served by warm environments.

  4. Requests arrive over 4 minutes: In minute 1, 2500 requests are handled by new execution environments; the same environments are reused in subsequent minutes to serve all requests.