To handle errors related to an SQS event source, Lambda automatically uses a retry strategy with a backoff strategy. You can also customize error handling behavior by configuring your SQS event source mapping to return partial batch responses.
Backoff strategy for failed invocations
When an invocation fails, Lambda attempts to retry the invocation while implementing a backoff strategy. The backoff strategy differs slightly depending on whether Lambda encountered the failure due to an error in your function code, or due to throttling.
-
If your function code caused the error, Lambda will stop processing and retrying the invocation. In the meantime, Lambda gradually backs off, reducing the amount of concurrency allocated to your Amazon SQS event source mapping. After your queue's visibility timeout runs out, the message will again reappear in the queue.
-
If the invocation fails due to throttling, Lambda gradually backs off retries by reducing the amount of concurrency allocated to your Amazon SQS event source mapping. Lambda continues to retry the message until the message's timestamp exceeds your queue's visibility timeout, at which point Lambda drops the message.
Implementing partial batch responses
When your Lambda function encounters an error while processing a batch, all messages in that batch become visible in the queue again by default, including messages that Lambda processed successfully. As a result, your function can end up processing the same message several times.
To avoid reprocessing successfully processed messages in a failed batch, you can configure your event
source mapping to make only the failed messages visible again. This is called a partial batch response.
To turn on partial batch responses, specify ReportBatchItemFailures
for the
FunctionResponseTypes
action when configuring your event source mapping. This lets your function
return a partial success, which can help reduce the number of unnecessary retries on records.
When ReportBatchItemFailures
is activated, Lambda doesn't scale down message polling when function invocations fail. If you expect some messages to fail—and you don't want those failures to impact the message processing rate—use ReportBatchItemFailures
.
Note
Keep the following in mind when using partial batch responses:
-
If your function throws an exception, the entire batch is considered a complete failure.
-
If you're using this feature with a FIFO queue, your function should stop processing messages after the first failure and return all failed and unprocessed messages in
batchItemFailures
. This helps preserve the ordering of messages in your queue.
To activate partial batch reporting
-
Review the Best practices for implementing partial batch responses.
-
Run the following command to activate
ReportBatchItemFailures
for your function. To retrieve your event source mapping's UUID, run the list-event-source-mappings AWS CLI command.aws lambda update-event-source-mapping \ --uuid
"a1b2c3d4-5678-90ab-cdef-11111EXAMPLE"
\ --function-response-types"ReportBatchItemFailures"
-
Update your function code to catch all exceptions and return failed messages in a
batchItemFailures
JSON response. ThebatchItemFailures
response must include a list of message IDs, asitemIdentifier
JSON values.For example, suppose you have a batch of five messages, with message IDs
id1
,id2
,id3
,id4
, andid5
. Your function successfully processesid1
,id3
, andid5
. To make messagesid2
andid4
visible again in your queue, your function should return the following response:{ "batchItemFailures": [ { "itemIdentifier": "id2" }, { "itemIdentifier": "id4" } ] }
Here are some examples of function code that return the list of failed message IDs in the batch:
- SDK for .NET
-
Note
There's more on GitHub. Find the complete example and learn how to set up and run in the Serverless examples
repository. Reporting SQS batch item failures with Lambda using .NET.
// Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. // SPDX-License-Identifier: Apache-2.0 using Amazon.Lambda.Core; using Amazon.Lambda.SQSEvents; // Assembly attribute to enable the Lambda function's JSON input to be converted into a .NET class. [assembly: LambdaSerializer(typeof(Amazon.Lambda.Serialization.SystemTextJson.DefaultLambdaJsonSerializer))] namespace sqsSample; public class Function { public async Task<SQSBatchResponse> FunctionHandler(SQSEvent evnt, ILambdaContext context) { List<SQSBatchResponse.BatchItemFailure> batchItemFailures = new List<SQSBatchResponse.BatchItemFailure>(); foreach(var message in evnt.Records) { try { //process your message await ProcessMessageAsync(message, context); } catch (System.Exception) { //Add failed message identifier to the batchItemFailures list batchItemFailures.Add(new SQSBatchResponse.BatchItemFailure{ItemIdentifier=message.MessageId}); } } return new SQSBatchResponse(batchItemFailures); } private async Task ProcessMessageAsync(SQSEvent.SQSMessage message, ILambdaContext context) { if (String.IsNullOrEmpty(message.Body)) { throw new Exception("No Body in SQS Message."); } context.Logger.LogInformation($"Processed message {message.Body}"); // TODO: Do interesting work based on the new message await Task.CompletedTask; } }
If the failed events do not return to the queue, see How do I troubleshoot Lambda function SQS ReportBatchItemFailures?
Success and failure conditions
Lambda treats a batch as a complete success if your function returns any of the following:
-
An empty
batchItemFailures
list -
A null
batchItemFailures
list -
An empty
EventResponse
-
A null
EventResponse
Lambda treats a batch as a complete failure if your function returns any of the following:
-
An invalid JSON response
-
An empty string
itemIdentifier
-
A null
itemIdentifier
-
An
itemIdentifier
with a bad key name -
An
itemIdentifier
value with a message ID that doesn't exist
CloudWatch metrics
To determine whether your function is correctly reporting batch item failures, you can monitor the
NumberOfMessagesDeleted
and ApproximateAgeOfOldestMessage
Amazon SQS metrics in
Amazon CloudWatch.
-
NumberOfMessagesDeleted
tracks the number of messages removed from your queue. If this drops to 0, this is a sign that your function response is not correctly returning failed messages. -
ApproximateAgeOfOldestMessage
tracks how long the oldest message has stayed in your queue. A sharp increase in this metric can indicate that your function is not correctly returning failed messages.