Configuring provisioned concurrency
In Lambda, concurrency is the number of in-flight requests that your function is handling at the same time. There are two types of concurrency controls available:
-
Reserved concurrency – Reserved concurrency is the maximum number of concurrent instances that you want to allocate to your function. When a function has reserved concurrency, no other function can use that concurrency. There is no charge for configuring reserved concurrency for a function.
-
Provisioned concurrency – Provisioned concurrency is the number of pre-initialized execution environments that you want to allocate to your function. These execution environments are prepared to respond immediately to incoming function requests. Configuring provisioned concurrency incurs charges to your AWS account.
This topic details how to manage and configure provisioned concurrency. For a conceptual overview of these two types of concurrency controls, see Reserved concurrency and provisioned concurrency. For more information on configuring reserved concurrency, see Configuring reserved concurrency.
Note
Lambda functions that an Amazon MQ event source mapping invokes have a default maximum concurrency. For Apache Active MQ, the maximum number of concurrent instances is 5. For Rabbit MQ, the maximum number of concurrent instances is 1. Setting reserved or provisioned concurrency for your function doesn't change these limits. To request an increase in the default maximum concurrency when using Amazon MQ, contact AWS Support.
Sections
Configuring provisioned concurrency
You can configure provisioned concurrency settings for a function using the Lambda console or the Lambda API.
To allocate provisioned concurrency for a function (console)
Open the Functions page
of the Lambda console. -
Choose the function you want to allocate provisioned concurrency for.
-
Choose Configuration and then choose Concurrency.
-
Under Provisioned concurrency configurations, choose Add configuration.
-
Choose Reserve concurrency. Enter the amount of concurrency to reserve for the function.
-
Choose the qualifier type, and alias or version.
Note
You cannot use provisioned concurrency with the $LATEST version of any function.
In addition, if you're using an event source with your Lambda function, make sure that event source points to the correct alias or version. Otherwise, your function won't use provisioned concurrency environments.
-
Enter a number under Provisioned concurrency. Lambda provides an estimate of monthly costs.
-
Choose Save.
You can configure up to the Unreserved account concurrency in your account, minus 100. The remaining 100 units of concurrency are for functions that aren't using reserved concurrency. For example, if your account has a concurrency limit of 1,000, and you haven't assigned any reserved or provisioned concurrency to any of your other functions, you can configure a maximum of 900 provisioned concurrency units for a single function.

Configuring provisioned concurrency for a function impacts the concurrency pool that's
available to other functions. For example, if you configure 100 units of provisioned concurrency
for function-a
, other functions in your account must share the remaining 900
units of concurrency, even if function-a
doesn't use all 100 provisioned
concurrency units.
You can allocate both reserved concurrency and provisioned concurrency for the same function. If you do so, the amount of provisioned concurrency cannot exceed the amount of reserved concurrency.
This limit also applies to function versions. The maximum amount of provisioned concurrency you can allocate to a specific function version is equal to the function's reserved concurrency minus the provisioned concurrency on other function versions.
To configure provisioned concurrency with the Lambda API, use the following API operations.
For example, to configure provisioned concurrency with the AWS Command Line Interface (CLI), use the
put-provisioned-concurrency-config
command. The following command allocates
100 units of provisioned concurrency for the BLUE
alias of a function named
my-function
:
aws lambda put-provisioned-concurrency-config --function-name my-function \ --qualifier BLUE \ --provisioned-concurrent-executions 100
You should see output that looks like the following:
{ "Requested ProvisionedConcurrentExecutions": 100, "Allocated ProvisionedConcurrentExecutions": 0, "Status": "IN_PROGRESS", "LastModified": "2023-01-21T11:30:00+0000" }
Accurately estimating required provisioned concurrency
If your function is currently serving traffic, you can easily view its concurrency
metrics using CloudWatch metrics.
Specifically, the ConcurrentExecutions
metric shows you the number of
concurrent invocations for each function in your account.

The previous graph suggests that this function serves an average of 5 to 10 concurrent requests at any given time, and peaks at 20 requests on a typical day. Suppose that there are many other functions in your account. If this function is critical to your application and you need a low-latency response on every invocation, use a number greater than or equal to 20 as your provisioned concurrency setting.
Alternatively, recall that you can also calculate concurrency using the following formula:
Concurrency = (average requests per second) * (average request duration in seconds)
Multiplying average requests per second with the average request duration in
seconds gives you a rough estimate of how much concurrency you need to reserve.
You can estimate average requests per second using the Invocation
metric, and the average request duration in seconds using the Duration
metric. See Working with Lambda function metrics
for more details.
When working with provisioned concurrency, Lambda suggests including a 10% buffer on top of the amount of concurrency your function typically needs. For example, if your function usually peaks at 200 concurrent requests, set your provisioned concurrency at 220 instead (200 concurrent requests + 10% = 220 provisioned concurrency).
Optimizing latency with provisioned concurrency
The way you structure your function code to optimize for latency can depend on whether you choose provisioned concurrency or on-demand environments. For functions running on provisioned concurrency, Lambda runs any initialization code (i.e. loading libraries and instantiating clients) at allocation time. So, putting as much initialization outside of the main function handler is a good idea, since doing so won't impact latency during actual function invocations. In contrast, if you initialize libraries or instantiate clients within your main handler code, your function has to run this each time you invoke it, regardless of whether or not you're using provisioned concurrency.
If you're using on-demand instances, Lambda may have to re-run your initialization code every time your function receives a request (cold start). Depending on what your function needs to achieve, you may choose to defer initialization for a specific capability until the function needs that capability. For example, consider the following control flow for a Lambda handler:
def handler(event, context): ... if ( some_condition ): // Initialize CLIENT_A to perform a task else: // Do nothing
In the previous example, instead of initializing CLIENT_A
outside
of the main handler, the function author chose to initialize it within the if
statement. By doing this, Lambda only runs this code if some_condition
is
satisfied. If the author initializes CLIENT_A
outside the main handler,
Lambda runs that code on every cold start, increasing overall latency.
It's possible for your function to use up all of its provisioned concurrency. To
handle excess traffic, your function has to use on-demand instances. To help you
determine what type of initialization Lambda used for a particular environment, check
the value of the AWS_LAMBDA_INITIALIZATION_TYPE
environment variable.
This variable can have two possible values: provisioned-concurrency
or
on-demand
. the value of AWS_LAMBDA_INITIALIZATION_TYPE
is
immutable and does not change over the lifetime of the execution environment.
If you use the .NET 6 or .NET 7 runtimes, you can configure the
AWS_LAMBDA_DOTNET_PREJIT
environment variable to improve the latency for
functions, even if they don't use provisioned concurrency. The .NET runtime lazily
compiles and initializes each library that your code calls for the first time. As a
result, the first invocation of a Lambda function can take longer than subsequent
invocations. To mitigate this, you can choose one of three values for
AWS_LAMBDA_DOTNET_PREJIT
:
-
ProvisionedConcurrency
: Lambda performs ahead-of-time JIT compilation for all environments using provisioned concurrency. This is the default value. -
Always
: Lambda performs ahead-of-time JIT compilation for every environment, even if the function doesn't use provisioned concurrency. -
Never
: Lambda disables ahead-of-time JIT compilation for all environments.
For provisioned concurrency environments, your function's initialization code
runs during allocation, and every few hours as Lambda recycles active instances
of your environment. You can see the initialization time in logs and
traces after an environment instance
processes a request. However, Lambda bills you for initialization even if the
environment instance never processes a request. Provisioned concurrency runs
continually and is billed separately from initialization and invocation costs.
For details, see AWS Lambda Pricing
For additional guidance on optimizing functions using provisioned concurrency,
see
Lambda execution environments
Managing provisioned concurrency with Application Auto Scaling
You can use Application Auto Scaling to manage provisioned concurrency on a schedule or based on utilization. If you observe predictable patterns of traffic to your function, use scheduled scaling. If you want your function to maintain a specific utilization percentage, use a target tracking scaling policy.
Scheduled scaling
With Application Auto Scaling, you can set your own scaling schedule according to predictable
load changes. For more information and examples, see
Scheduled scaling for Application Auto Scaling in the Application Auto Scaling User Guide, and
Scheduling AWS Lambda Provisioned Concurrency for recurring peak usage
Target tracking
With target tracking, Application Auto Scaling creates and manages a set of CloudWatch alarms based on how you define your scaling policy. When these alarms activate, Application Auto Scaling automatically adjusts the amount of environments allocated using provisioned concurrency. Target tracking is ideal for applications that don't have predictable traffic patterns.
To scale provisioned concurrency using target tracking, use the
RegisterScalableTarget
and PutScalingPolicy
Application Auto Scaling API operations. For example, if you're using the AWS Command Line Interface (CLI),
follow these steps:
-
Register a function's alias as a scaling target. The following example registers the BLUE alias of a function named
my-function
:aws application-autoscaling register-scalable-target --service-namespace lambda \ --resource-id function:my-function:BLUE --min-capacity 1 --max-capacity 100 \ --scalable-dimension lambda:function:ProvisionedConcurrency
-
Apply a scaling policy to the target. The following example configures Application Auto Scaling to adjust the provisioned concurrency configuration for an alias to keep utilization near 70 percent.
aws application-autoscaling put-scaling-policy \ --service-namespace lambda \ --scalable-dimension lambda:function:ProvisionedConcurrency \ --resource-id function:my-function:BLUE \ --policy-name my-policy \ --policy-type TargetTrackingScaling \ --target-tracking-scaling-policy-configuration '{ "TargetValue": 0.7, "PredefinedMetricSpecification": { "PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization" }}'
You should see output that looks like this:
{ "PolicyARN": "arn:aws:autoscaling:us-east-2:123456789012:scalingPolicy:12266dbb-1524-xmpl-a64e-9a0a34b996fa:resource/lambda/function:my-function:BLUE:policyName/my-policy", "Alarms": [ { "AlarmName": "TargetTracking-function:my-function:BLUE-AlarmHigh-aed0e274-xmpl-40fe-8cba-2e78f000c0a7", "AlarmARN": "arn:aws:cloudwatch:us-east-2:123456789012:alarm:TargetTracking-function:my-function:BLUE-AlarmHigh-aed0e274-xmpl-40fe-8cba-2e78f000c0a7" }, { "AlarmName": "TargetTracking-function:my-function:BLUE-AlarmLow-7e1a928e-xmpl-4d2b-8c01-782321bc6f66", "AlarmARN": "arn:aws:cloudwatch:us-east-2:123456789012:alarm:TargetTracking-function:my-function:BLUE-AlarmLow-7e1a928e-xmpl-4d2b-8c01-782321bc6f66" } ] }
Application Auto Scaling creates two alarms in CloudWatch. The first alarm triggers when the utilization of provisioned concurrency consistently exceeds 70%. When this happens, Application Auto Scaling allocates more provisioned concurrency to reduce utilization. The second alarm triggers when utilization is consistently less than 63% (90 percent of the 70% target). When this happens, Application Auto Scaling reduces the alias's provisioned concurrency.
In the following example, a function scales between a minimum and maximum amount of provisioned concurrency based on utilization.

Legend
-
Function instances
-
Open requests
-
Provisioned concurrency
-
Standard concurrency
When the number of open requests increase, Application Auto Scaling increases provisioned concurrency in large steps until it reaches the configured maximum. Once it reaches the maximum, the function can continue to scale on standard, unreserved concurrency if your account hasn't reached its account concurrency limit. When utilization drops and stays consistently low, Application Auto Scaling decreases provisioned concurrency in smaller periodic steps.
Both of the alarms that Application Auto Scaling manages use the average statistic by default. Functions that have traffic patterns that come in quick bursts may not trigger these alarms. For example, suppose your Lambda function executes quickly (i.e. 20-100 ms) and your traffic pattern comes in quick bursts. In this case, the number of requests may exceed allocated provisioned concurrency during the burst, but the burst load must sustain for at least 3 minutes for Application Auto Scaling to provision additional environments. Additionally, both CloudWatch alarms require 3 data points that hit the target average before activating the auto scaling policy.
For more information on target tracking scaling policies, see Target tracking scaling policies for Application Auto Scaling.