Process multiple prompts with batch inference

With batch inference, you can submit multiple prompts and generate responses asynchronously. Batch inference helps you process a large number of requests efficiently by sending a single request and generating the responses in an Amazon S3 bucket. After defining model inputs in files you create, you upload the files to an S3 bucket. You then submit a batch inference request and specify the S3 bucket. After the job is complete, you can retrieve the output files from S3. You can use batch inference to improve the performance of model inference on large datasets.

Note

Batch inference isn't supported for provisioned models.

Refer to the following resources for general information about batch inference:

To see pricing for batch inference, see Amazon Bedrock pricing.
To see quotas for batch inference, see Amazon Bedrock endpoints and quotas in the AWS General Reference.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Prompt caching

Supported Regions and models