Understanding SQS retries
The specific retry behavior for processing SQS messages is determined in the SQS queue configuration. Here you can set the visibility timeout, message retention period, and delivery delay.
If a Lambda function throws an error, the Lambda service continues to process the failed message until:
-
The message is processed without any error from the function, and the service deletes the message from the queue.
-
The Message retention period is reached and SQS deletes the message from the queue.
-
There is a dead-letter queue (DLQ) configured and SQS sends the message to this queue. It’s best practice to enable a DLQ on an SQS queue to prevent any message loss.
Lambda does not delete messages from the queue unless there is a successful invocation. By default, if any messages in a batch fail, all messages are returned to the original queue for reprocessing. including messages that Lambda processes successfully. Specify individual message failures using batchItemFailures in the function response. Only the failed items are then reprocessed.
In an application under heavy load or with spiky traffic patterns, it’s recommended that you:
-
Set the queue’s visibility timeout to at least six times the function timeout value. This allows the function time to process each batch of records if the function execution is throttled while processing a previous batch.
-
Set the maxReceiveCount on the source queue’s redrive policy to at least 5. This improves the chances of messages being processed before reaching the DLQ.
-
Ensure idempotency to allow messages to be safely processed more than once.
Note that an SQS DLQ is different to a Lambda DLQ, which is used for the function’s asynchronous invocation queue, not for event source queues.