Run batch inference - Amazon Bedrock

Run batch inference


Batch inference is in preview and is subject to change. Batch inference is currently only available through the API. Access batch APIs through the following SDKs.

We recommend that you create a virtual environment to use the SDK. Because batch inference APIs aren't available in the latest SDKs, we recommend that you uninstall the latest version of the SDK from the virtual environment before installing the version with the batch inference APIs. For a guided example, see Code samples.

With batch inference, you can run multiple inference requests asynchronously to process a large number of requests efficiently by running inference on data that is stored in an S3 bucket. You can use batch inference to improve the performance of model inference on large datasets.


Batch inference isn't supported for provisioned models.

To see quotas for batch inference, see Batch inference quotas.

Amazon Bedrock supports batch inference on the following modalities.

  • Text to embeddings

  • Text to text

  • Text to image

  • Image to image

  • Image to embeddings

You store your data in an Amazon S3 bucket to prepare it for batch inference. You can then carry out and manage batch inference jobs through using the ModelInvocationJob APIs.

Before you can carry out batch inference, you must receive permissions to call the batch inference APIs. You then configure an IAM Amazon Bedrock service role to have permissions to carry out batch inference jobs.

You can use the batch inference APIs by downloading and installing one of the following AWS SDK packages.