Throttle API requests for better throughput
To prevent your API from being overwhelmed by too many requests, Amazon API Gateway
throttles
requests to your API using the token
bucket algorithm
When request submissions exceed the steady-state request rate and burst limits, API
Gateway
fails the limit-exceeding requests and returns 429 Too Many Requests
error
responses to the client. Upon catching such exceptions, the client can resubmit the
failed
requests in a way that is rate limiting, while complying with the API Gateway throttling
limits.
As an API developer, you can set the limits for individual API stages or methods to improve overall performance across all APIs in your account. Alternatively, you can enable usage plans to restrict client request submissions to within specified request rates and quotas. This restricts the overall request submissions so that they don't go significantly past the account-level throttling limits in a Region.
Topics
How throttling limit settings are applied in API Gateway
Before you configure limit settings for your API in your stage settings and optionally a usage plan, it's useful to understand Amazon API Gateway how throttling limit settings are applied.
Amazon API Gateway provides two basic types of throttling-related settings:
-
Server-side throttling limits are applied across all clients. These limit settings exist to prevent your API—and your account—from being overwhelmed by too many requests.
-
Per-client throttling limits are applied to clients that use API keys associated with your usage policy as client identifier.
API Gateway throttling-related settings are applied in the following order:
-
Per-client per-method throttling limits that you set for an API stage in a usage plan
-
Per-client throttling limits that you set in a usage plan
-
Default per-method limits and individual per-method limits that you set in API stage settings
Account-level throttling per Region
By default, API Gateway limits the steady-state request rate per second (rps) across
all
APIs within an AWS account, per Region. It also limits the burst (that is, the maximum
bucket size) across all APIs within an AWS account, per Region. In API Gateway, the
burst
limit corresponds to the maximum number of concurrent request submissions that API
Gateway can
fulfill at any moment without returning 429 Too Many Requests
error
responses. For more information on throttling quotas, see Amazon API Gateway quotas and important notes.
To help understand these throttling limits, here are a few examples, given a burst limit of 5,000 and an account-level rate limit of 10,000 requests per second in the Region:
-
If a caller submits 10,000 requests in a one-second period evenly (for example, 10 requests every millisecond), API Gateway processes all requests without dropping any.
-
If the caller sends 10,000 requests in the first millisecond, API Gateway serves 5,000 of those requests and throttles the rest in the one-second period.
-
If the caller submits 5,000 requests in the first millisecond and then evenly spreads another 5,000 requests through the remaining 999 milliseconds (for example, about 5 requests every millisecond), API Gateway processes all 10,000 requests in the one-second period without returning
429 Too Many Requests
error responses. -
If the caller submits 5,000 requests in the first millisecond and waits until the 101st millisecond to submit another 5,000 requests, API Gateway processes 6,000 requests and throttles the rest in the one-second period. This is because at the rate of 10,000 rps, API Gateway has served 1,000 requests after the first 100 milliseconds and thus emptied the bucket by the same amount. Of the next spike of 5,000 requests, 1,000 fill the bucket and are queued to be processed. The other 4,000 exceed the bucket capacity and are discarded.
-
If the caller submits 5,000 requests in the first millisecond, submits 1,000 requests at the 101st millisecond, and then evenly spreads another 4,000 requests through the remaining 899 milliseconds, API Gateway processes all 10,000 requests in the one-second period without throttling.
More generally, at any given moment, when a bucket contains b
and the
maximum bucket capacity is B
, the maximum additional tokens that can be
added to the bucket is Δ=B-b
. This maximum number of additional
tokens corresponds to the maximum number of additional concurrent requests that a
client
can submit without receiving any 429
error responses. In general,
Δ
varies in time. The value ranges from zero when the bucket is
full (that is, b=B
) to B
when the bucket is empty (that is,
b=0
). The range depends on the request-processing rate (the rate at
which tokens are removed from the bucket) and the rate limit rate (the rate at which
tokens are added to the bucket).
The following schematic shows the general behaviors of Δ
, the
maximum additional concurrent requests, as a function of time. The schematic assumes
that the tokens in the bucket decrease at a combined rate of r
, starting
from an empty bucket.

The account-level rate limit can be increased upon request. To request an increase
of
account-level throttling limits per Region, contact the AWS
Support Center
Default method throttling and overriding default method throttling
You can set the default method throttling to override the account-level request throttling limits for a specific stage or for individual methods in your API. The default method throttling limits are bounded by the account-level rate limits per Region, even if you set the default method throttling limits higher than the account-level limits.
You can set the default method throttling limits in the API Gateway console by using the Default Method Throttling setting in Stages. For instructions on using the console, see Update stage settings.
You can also set the default method throttling limits by calling the API references.
Configuring API-level and stage-level throttling in a usage plan
In a usage plan, you can set a default per-method throttling limit for all methods at the API or stage level under Create Usage Plan as shown in Create a usage plan.
Configuring method-level throttling in a usage plan
You can set additional throttling limits at the method level in Usage
Plans as shown in Create a usage plan. In the API Gateway console, these are set
by specifying Resource=
,
<resource>
Method=
in the
Configure Method Throttling setting. For example, for the PetStore example, you might
specify <method>
Resource=/pets
, Method=GET
.