Best practices for implementing partial batch responses - AWS Prescriptive Guidance

Best practices for implementing partial batch responses

The following are best practices for configuring partial batch responses for Amazon SQS event sources:

  • Configure a dead-letter queue to avoid creating a snowball anti-pattern in your serverless application’s architecture. For more information, see the Avoiding snowball anti-patterns section of this guide.

  • Configure your Lambda function event source mapping to make only the failed messages visible. To do this, you must include the value ReportBatchItemFailures in the FunctionResponseTypes list when configuring your event source mapping. For more information, see Implementing partial batch responses in the AWS Lambda Developer Guide.

  • Define the number of times that you want a message delivered to the source queue before it’s moved to the dead-letter queue. Make sure that the value you define fits your application’s use case by identifying the most likely causes of failure and their estimated recovery times. To define the number of retries, you must configure the maxReceiveCount value on the source queue’s RedrivePolicy. For more information, see SetQueueAttributes in the Amazon SQS API Reference. Also, see Introducing Amazon Simple Queue Service dead-letter queue redrive to source queues on the AWS Blog.

  • Make sure that your Lambda function code is idempotent and capable of handling messages multiple times. This prepares the function’s code to support individual jobs inside an Amazon SQS message batch. A good starting point is incorporating ReportBatchItemFailures in your event source mapping configuration. For more information, see Reporting batch item failures in the AWS Lambda Developer Guide. Also, see How can I prevent an Amazon SQS message from invoking my Lambda function more than once?

  • Consider using tools such as aws-embedded-metrics or Powertools for AWS Lambda (Python). These tools help you incorporate business metrics in your function code to track failed jobs and the details on those jobs.

  • If you're using this feature with a First-In-First-Out (FIFO) queue, your function should stop processing messages after the first failure and return all failed and unprocessed messages in batchItemFailures. This helps preserve the ordering of messages in your queue.

Note

Code-level performance tracking is required to track the overall performance of an application that uses partial batch processing. After partial batch processing is configured, Lambda function invocations almost always succeed, no matter what the result of the batch processing is.

Avoiding snowball anti-patterns

Lambda and Amazon SQS can’t control the messages that upstream microservices write to an Amazon SQS queue. If there are messages that can’t be processed, Lambda returns those unprocessed messages to the source Amazon SQS queue, unless a separate dead-letter queue is configured. Those unprocessed messages are then retried by the Lambda function in each following Amazon SQS message batch, fail, and return to the queue to be retried. If no dead-letter queue exists, the number of unprocessed messages returned to the Amazon SQS queue eventually outnumbers the valid messages in the queue.

This type of snowball anti-pattern—where each successive Lambda function invocation makes the problem worse—can cause the following issues:

  • Poor user experience because the jobs take much longer to process than usual, or don’t process at all

  • Increased cost proportional to the exponentially increasing number of messages in the Amazon SQS queue and message retries

  • Reduced Lambda computing capacity for the application or entire AWS account if the function doesn’t have a limit on its invocation requests

To avoid creating a snowball anti-pattern when configuring partial batch responses in Amazon SQS, it’s a best practice to also create a dead-letter queue. This separate queue can store messages that aren’t processed successfully and help you better manage the lifecycle of your application’s unprocessed messages.

For instructions, see Configuring a dead-letter queue (console) in the Amazon SQS Developer Guide.