Retries - AWS SDK for Kotlin

Retries

Calls to AWS services occasionally return unexpected exceptions. Certain types of errors, such as throttling or transient errors, might be successful if the call is retried.

This page describes how to configure automatic retries with the AWS SDK for Kotlin.

Default retry configuration

By default, every service client is automatically configured with a standard retry strategy. The default configuration tries a call that fails up to three times (the initial attempt plus two retries). The intervening delay between each call is configured with exponential backoff and random jitter to avoid retry storms. This configuration works for the majority of use cases but may be unsuitable in some circumstances, such as high-throughput systems.

The SDK attempts retries only on retryable errors. Examples of retryable errors are socket timeouts, service-side throttling, concurrency or optimistic lock failures, and transient service errors. Missing or invalid parameters, authentication/security errors, and misconfiguration exceptions are not considered retryable.

You can customize the standard retry strategy by setting the maximum attempts, delays and backoff, and token bucket configuration.

Maximum attempts

You can customize the default maximum attempts (3) in the retryStrategy DSL block during client construction.

val dynamoDb = DynamoDbClient.fromEnvironment { retryStrategy { maxAttempts = 5 } }

With the DynamoDB service client shown in the previous snippet, the SDK tries API calls that fail up to five times (the initial attempt plus four retries).

You can disable automatic retries completely by setting the maximum attempts to one as shown in the following snippet.

val dynamoDb = DynamoDbClient.fromEnvironment { retryStrategy { maxAttempts = 1 // The SDK makes no retries. } }

Delays and backoff

If a retry is necessary, the default retry strategy waits before it makes the subsequent attempt. The delay for the first retry is small but it grows exponentially for later retries. The maximum amount of delay is capped so that it does not grow too large.

Finally, random jitter is applied to the delays between all attempts. The jitter helps mitigate the effect of large fleets that can cause retry storms. (See this AWS Architecture Blog post for a deeper discussion about exponential backoff and jitter.)

Delay parameters are configurable in the delayProvider DSL block.

val dynamoDb = DynamoDbClient.fromEnvironment { retryStrategy { delayProvider { initialDelay = 100.milliseconds maxBackoff = 5.seconds } } }

With the configuration shown in the previous snippet, the client delays the first retry attempt for up to 100 milliseconds. The maximum amount of time between any retry attempt is 5 seconds.

The following parameters are available for tuning delays and backoff.

Parameter Default value Description
initialDelay 10 milliseconds The maximum amount of delay for the first retry. When jitter is applied, the actual amount of delay may be less.
jitter 1.0 (full jitter)

The maximum amplitude by which to randomly reduce the calculated delay. The default value of 1.0 means that the calculated delay can be reduced to any amount up to 100% (for example, down to 0). A value of 0.5 means that the calculated delay can be reduced by up to half. Thus, a max delay of 10ms could be reduced to anywhere between 5ms and 10ms. A value of 0.0 means that no jitter is applied.

Important

️Jitter configuration is an advanced feature. Customizing this behavior is not normally recommended.

maxBackoff 20 seconds The maximum amount of delay to apply to any attempt. Setting this value limits the exponential growth that occurs between subsequent attempts and prevents the calculated maximum from being too large. This parameter limits the calculated delay before jitter is applied. If applied, jitter might reduce the delay even further.
scaleFactor 1.5

The exponential base by which subsequent maximum delays will be increased. For example, given an initialDelay of 10ms and a scaleFactor of 1.5, the following max delays would be calculated:

  • Retry 1: 10ms × 1.5⁰ = 10ms

  • Retry 2: 10ms × 1.5¹ = 15ms

  • Retry 3: 10ms × 1.5² = 22.5ms

  • Retry 4: 10ms × 1.5³ = 33.75ms

When jitter is applied, the actual amount of each delay might be less.

Retry token bucket

You can modify retry behavior further by using a token bucket algorithm. This helps to reduce failure retries that are less likely to succeed or that might take more time to resolve, such as timeout and throttling failures.

Important

Token bucket configuration is an advanced feature. Customizing this behavior is not normally recommended.

Each retry attempt (optionally including the initial attempt) decrements some capacity from the token bucket. The amount decremented depends on the type of attempt. For example, retrying transient errors might be cheap, but retrying timeout or throttling errors might be more expensive.

A successful attempt returns capacity to the bucket. The bucket may not be incremented beyond its maximum capacity nor decremented below zero.

Depending on the value of the useCircuitBreakerMode setting, attempts to decrement capacity below zero result in one of the following outcomes:

  • An exception is thrown – For example, if too many retries have occurred and more retries are unlikely to succeed.

  • A delay – For example, delays until the bucket has sufficient capacity again.

The token bucket parameters are configurable in the tokenBucket DSL block:

val dynamoDb = DynamoDbClient.fromEnvironment { retryStrategy { tokenBucket { maxCapacity = 100 refillUnitsPerSecond = 2 } } }

The following parameters are available for tuning the retry token bucket:

Parameter Default value Description
initialTryCost 0 The amount to decrement from the bucket for initial attempts. The default value of 0 means that no capacity will be decremented and thus initial attempts are not stopped or delayed.
initialTrySuccessIncrement 1 The amount to increment capacity when the initial attempt was successful.
maxCapacity 500 The maximum capacity of the token bucket. The number of available tokens cannot exceed this number.
refillUnitsPerSecond 0 The amount of capacity re-added to the bucket every second. A value of 0 means that no capacity is automatically re-added. (For example, only successful attempts result in incrementing capacity). A value of 0 requires useCircuitBreakerMode to be TRUE.
retryCost 5 The amount to decrement from the bucket for an attempt following a transient failure. The same amount is re-incremented back to the bucket if the attempt is successful.
timeoutRetryCost 10 The amount to decrement from the bucket for an attempt following a timeout or throttling failure. The same amount is re-incremented back to the bucket if the attempt is successful.
useCircuitBreakerMode TRUE Determines the behavior when an attempt to decrement capacity would result in the bucket's capacity to fall below zero. When TRUE, the token bucket will throw an exception indicating that no more retry capacity exists. When FALSE, the token bucket will delay the attempt until sufficient capacity has refilled.

Adaptive retries

As an alternative to the standard retry strategy, the adaptive retry strategy is an advanced approach that seeks the ideal request rate to minimize throttling errors.

Important

Adaptive retries is an advanced retry mode. Using this retry strategy is not normally recommended.

Adaptive retries includes all the features of standard retries. It adds a client-side rate limiter that measures the rate of throttled requests compared to non-throttled requests. It also limits traffic to attempt to stay within a safe bandwidth, ideally causing zero throttling errors.

The rate adapts in real time to changing service conditions and traffic patterns and might increase or decrease the rate of traffic accordingly. Critically, the rate limiter might delay initial attempts in high-traffic scenarios.

You select the adaptive retry strategy by providing an additional parameter to the retryStrategy method. The rate limiter parameters are configurable in the rateLimiter DSL block.

val dynamoDb = DynamoDbClient.fromEnvironment { retryStrategy(AdaptiveRetryStrategy) { maxAttempts = 10 rateLimiter { minFillRate = 1.0 smoothing = 0.75 } } }
Note

The adaptive retry strategy assumes that the client works against a single resource (for example, one DynamoDB table or one Amazon S3 bucket).

If you use a single client for multiple resources, throttling or outages associated with one resource result in increased latency and failures when the client accesses all other resources. When you use the adaptive retry strategy, we recommend that you use a single client for each resource.