Provisioned Concurrency scaling example - AWS Lambda

Provisioned Concurrency scaling example

The majority of Lambda workloads are asynchronous so the default scaling behavior provides a reasonable trade-off between throughput and configuration management overhead. However, for synchronous invocations from interactive workloads, such as web or mobile applications, there are times when you need more control over how many concurrent function instances are ready to receive traffic.

Provisioned Concurrency is a Lambda feature that prepares concurrent execution environments in advance of invocations. Consequently, this can be used to address two issues. First, if expected traffic arrives more quickly than the default burst capacity, Provisioned Concurrency can ensure that your function is available to meet the demand. Second, if you have latency-sensitive workloads that require predictable double-digit millisecond latency, Provisioned Concurrency solves the typical cold start issues associated with default scaling.

Provisioned Concurrency is a configuration available for a specific published version or alias of a Lambda function. It does not rely on any custom code or changes to a function’s logic, and it’s compatible with features such as VPC configuration, Lambda layers. For more information on how Provisioned Concurrency optimizes performance for Lambda-based applications, see Performance optimization.

Using the same scenarios with 10,000 requests, the function is configured with a Provisioned Concurrency of 7,000:

               application design figure 5
  1. In case #1, 7,000 requests are handled by the provisioned environments with no cold start. The remaining 3,000 requests are handled by new, on-demand execution environments.

  2. In cases #2-4, all requests are handled by provisioned environments in the minute when they arrive.