Amazon Simple Queue Service
Developer Guide

Increasing Throughput with Horizontal Scaling and Batching

Amazon SQS queues can deliver very high throughput (many thousands of messages per second). To achieve this throughput, you must scale message producers and consumers horizontally (add more producers and consumers).

In addition to horizontal scaling, batching provides a throughput with fewer threads, connections, and requests than would be required by individual message requests. You can use batched Amazon SQS API actions to send, receive, or delete up to 10 messages at a time. Because Amazon SQS charges by the request instead of by the message, batching can also substantially reduce costs.

Horizontal Scaling

Because you access Amazon SQS through an HTTP request-response protocol, the request latency (the time interval between initiating a request and receiving a response) limits the throughput that you can achieve from a single thread over a single connection. For example, if the latency from an Amazon Elastic Compute Cloud (Amazon EC2) based client to Amazon SQS in the same region averages around 20 ms, the maximum throughput from a single thread over a single connection averages 50 operations per second.

Horizontal scaling means increasing the number of your message producers (making SendMessage requests) and consumers (making ReceiveMessage and DeleteMessage requests) in order to increase your overall queue throughput. You can scale horizontally by increasing the number of threads on a client, adding clients, or both. You should achieve essentially linear gains in queue throughput as you add more clients. For example, if you double the number of clients, you can get twice the throughput.


As you scale horizontally, you need to ensure that the Amazon SQS queue that you use has enough connections or threads to support the number of concurrent message producers and consumers that send requests and receive responses. For example, by default, instances of the AWS SDK for Java AmazonSQSClient class maintain at most 50 connections to Amazon SQS. To create additional concurrent producers and consumers, you’ll need to adjust that limit. For example, in the AWS SDK for Java, you can adjust the maximum number of allowable producer and consumer threads on an AmazonSQSClientBuilder object with this line of code:

AmazonSQS sqsClient = AmazonSQSClientBuilder.standard() .withClientConfiguration(new ClientConfiguration().withMaxConnections(producerCount + consumerCount)) .build();

For the SDK for Java asynchronous client AmazonSQSAsyncClient, you’ll also need to make sure there are enough threads available. For more information, consult the documentation for the SDK library that you're using.


The batching actions in the Amazon SQS API (SendMessageBatch and DeleteMessageBatch) can further optimize throughput by processing up to ten messages at a time. ReceiveMessage can process ten messages at a time, so there is no ReceiveMessageBatch action.

The basic idea of batching is to perform more work in each round trip to the service (e.g., sending multiple messages with a single SendMessageBatch request), and to distribute the latency of the batch operation over the multiple messages in the batch request, as opposed to accepting the entire latency for a single message (for example, a SendMessage request). Because each round-trip carries more work, batch requests make more efficient use of threads and connections and so improve throughput. Amazon SQS charges by the request, so the cost can be greatly reduced when fewer requests are processing the same number of messages. Moreover, fewer threads and connections reduce client-side resource utilization and can reduce client-side cost by doing the same work with smaller or fewer hosts.

Batching does introduce a bit of complication for the application. For example, the application has to accumulate the messages before sending them and it sometimes has to wait longer for a response, but batching can be effective in the following circumstances:

  • Your application is generating a lot of messages in a short time, so the delay is never very long.

  • A message consumer fetches messages from a queue at its discretion, as opposed to typical message producers that need to send messages in response to events they don't control.


A batch request (SendMessageBatch or DeleteMessageBatch) may succeed even though individual messages in the batch have failed. After a batch request, you should always check for individual message failures and retry them if necessary.

You can take advantage of batching without changing your producers and consumers using the Amazon SQS Buffered Asynchronous Client.


The example presented in this section implements a simple producer-consumer pattern. The complete example is available as a free download at The resources that are deployed by each template are described later in this section.

The code for the samples is available on the provisioned instances in /tmp/sqs-producer-consumer-sample/src

The command line for the configured run is in /tmp/sqs-producer-consumer-sample/command.log

The main thread spawns a number of producer and consumer threads that process 1 KB messages for a specified time. The example includes producers and consumers that make single-operation requests and others that make batch requests.

In the program, each producer thread sends messages until the main thread stops the producer thread. The producedCount object tracks the number of messages produced by all producer threads. Error handling is simple: if there is an error, the program exits the run() method. Requests that fail on transient errors are, by default, retried three times by the AmazonSQSClient, so very few such errors are surfaced. The retry count can be configured as necessary to reduce the number of exceptions that are thrown. The run() method on the message producer is implemented as follows:

try { while (!stop.get()) { sqsClient.sendMessage(new SendMessageRequest(queueUrl, theMessage)); producedCount.incrementAndGet(); } } catch (AmazonClientException e) { // By default AmazonSQSClient retries calls 3 times before failing, // so when this rare condition occurs, simply stop. log.error("Producer: " + e.getMessage()); System.exit(1); }

The batch producer is much the same. One noteworthy difference is the need to retry failed individual batch entries:

SendMessageBatchResult batchResult = sqsClient.sendMessageBatch(batchRequest); if (!batchResult.getFailed().isEmpty()) { log.warn("Producer: retrying sending " + batchResult.getFailed().size() + " messages"); for (int i = 0, n = batchResult.getFailed().size(); i < n; i++) sqsClient.sendMessage(new SendMessageRequest(queueUrl, theMessage)); }

The consumer run() method is as follows:

while (!stop.get()) { result = sqsClient.receiveMessage(new ReceiveMessageRequest(queueUrl)); if (!result.getMessages().isEmpty()) { m = result.getMessages().get(0); sqsClient.deleteMessage(new DeleteMessageRequest(queueUrl, m.getReceiptHandle())); consumedCount.incrementAndGet(); } }

Each consumer thread receives and deletes messages until it's stopped by the main thread. The consumedCount object tracks the number of messages that are consumed by all consumer threads, and the count is periodically logged. The batch consumer is similar, except that up to ten messages are received at a time, and it uses DeleteMessageBatch instead of DeleteMessage.

Running the Example

You can use the AWS CloudFormation templates provided to run the example code in three different configurations: single host with the single operation requests, two hosts with the single operation requests, one host with the batch requests.


The complete sample is available in a single .tar file. The resources that are deployed by each template are described later in this section.

The code for the samples is available on the provisioned instances in /tmp/sqs-producer-consumer-sample/src. The command line for the configured run is in /tmp/sqs-producer-consumer-sample/command.log.

The default duration (20 minutes) is set to provide three or four 5-minute CloudWatch data points of volume metrics. The Amazon EC2 cost for each run is the m1.large instance cost. The Amazon SQS cost varies based on the API call rate for each sample, and this ranges between approximately 38,000 API calls per minute for the batching sample and 380,000 API calls per minute for the double-host, single-API sample.

If you want to deploy the AWS CloudFormation stack in a region other than the US East (N. Virginia) region, in the Region box of the AWS CloudFormation console, choose a region.

To run the example

  1. Choose the link below that corresponds to the stack that you want to launch:

    • Single Operation API, One Host: This example template uses the single operation form of Amazon SQS API requests: SendMessage, ReceiveMessage, and DeleteMessage. A single m1.large Amazon EC2 instance animates 16 producer threads and 32 consumer threads. View template

    • Single Operation API, Two Hosts: This example template uses the single operation form of Amazon SQS API requests, but instead of a single m1.large Amazon EC2 instance, it uses two, each with 16 producer threads and 32 consumer threads for a total of 32 producers and 64 consumers. It illustrates the elasticity of Amazon SQS with throughput increasing proportionally to the greater number of producers and consumers. View Template

    • Batch API, One Host: This example template uses the batch form of Amazon SQS API requests on a single m1.large Amazon EC2 instance with 12 producer threads and 20 consumer threads. View Template

  2. If you're prompted, sign in to the AWS Management Console.

  3. In the Create Stack wizard, on the Select Template page, choose Continue.

  4. On the Specify Parameters page, specify how long the program should run, whether or not you want to automatically terminate the Amazon EC2 instances when the run is complete, and provide an Amazon EC2 key pair so that you can access the instances that are running the sample.

  5. Select the I acknowledge that this template may create IAM resources check box. All templates create an AWS Identity and Access Management (IAM) user so that the producer-consumer program can access the queue.

  6. When all the settings are as you want them, choose Continue.

  7. On the Review page, review the settings. If they're as you want them, choose Continue. If not, choose Back and make the necessary changes.

  8. On the final page of the wizard, choose Close. Stack deployment may take several minutes.

To follow the progress of stack deployment, in the AWS CloudFormation console, choose the sample stack. In the lower pane, choose the Events tab. After the stack is created, it should take less than 5 minutes for the sample to start running. When it does, you can see the queue in the Amazon SQS console.

To monitor queue activity, you can do the following:

  • Access the client instance, and open its output log file /tmp/sqs-producer-consumer-sample/output.log for a tally of messages produced and consumed so far. This tally is updated once per second.

  • In the Amazon SQS console, observe changes in the Message Available and Messages in Flight numbers.

In addition, after a delay of up to 15 minutes after the queue is started, you can monitor the queue in CloudWatch as described later in this topic.

Although the templates and samples have safeguards to prevent excessive use of resources, it's best to delete your AWS CloudFormation stacks when you're done running the samples. To do so, in the Amazon SQS console, choose the stack that you want to delete, and then choose Delete Stack. When the resources are all deleted, all CloudWatch metrics drop to zero.

Monitoring Volume Metrics from Example Run

Amazon SQS automatically generates volume metrics for messages sent, received, and deleted. You can access those metrics and others through the CloudWatch console. The metrics can take up to 15 minutes after the queue starts to become available. To manage the search result set, choose Search, and then select the check boxes that correspond to the queues and metrics that you want to monitor.

Here is the NumberOfMessageSent metric for consecutive runs of the three samples. Your results may vary somewhat, but the results should be qualitatively similar:

  • The NumberOfMessagesReceived and NumberOfMessagesDeleted metrics show the same pattern, but we have omitted them from this graph to reduce clutter.

  • The first sample (single operation API on a single m1.large instance) delivers approximately 210,000 messages over 5 minutes, or about 700 messages per second, with the same throughput for receive and delete operations.

  • The second sample (single operation API on two m1.large instances) delivers roughly double that throughput: approximately 440,000 messages in 5 minutes, or about 1,450 messages per second, with the same throughput for receive and delete operations.

  • The last sample (batch API on a single m1.large instance) delivers over 800,000 messages in 5 minutes, or about 2,500 messages per second, with the same throughput for received and deleted messages. With a batch size of 10, these messages are processed with far fewer requests and therefore at lower cost.