Run model inference - Amazon Bedrock

Run model inference

Inference refers to the process of generating an output from an input provided to a model. Foundation models use probability to construct the words in a sequence. Given an input, the model predicts a probable sequence of tokens that follows, and returns that sequence as the output. Amazon Bedrock provides you the capability of running inference in the foundation model of your choice. When you run inference, you provide the following inputs.

Amazon Bedrock offers a suite of foundation models that you can use to generate outputs of the following modalities. To see modality support by foundation model, refer to Supported foundation models in Amazon Bedrock.

Output modality Description Example use cases
Text Provide text input and generate various types of text Chat, question-and-answering, brainstorming, summarization, code generation, table creation, data formatting, rewriting
Image Provide text or input images and generate or modify images Image generation, image editing, image variation
Embeddings Provide text, images, or both text and images and generate a vector of numeric values that represent the input. The output vector can be compared to other embeddings vectors to determine semantic similarity (for text) or visual similarity (for images). Text and image search, query, categorization, recommendations, personalization, knowledge base creation

You can run model inference in the following ways.

  • Use any of the Playgrounds to run inference in a user-friendly graphical interface.

  • Send an InvokeModel or InvokeModelWithResponseStream request.

  • Prepare a dataset of prompts with your desired configurations and run batch inference with a CreateModelInvocationJob request.

  • The following Amazon Bedrock features use model inference as a step in a larger orchestration. Refer to those sections for more details.

You can run inference with base models, custom models, or provisioned models. To run inference on a custom model, first purchase Provisioned Throughput for it (for more information, see Provisioned Throughput for Amazon Bedrock).

Use these methods to test foundation model responses with different prompts and inference parameters. Once you have sufficiently explored these methods, you can set up your application to run model inference by calling these APIs.

Select a topic to learn more about running model inference through that method. To learn more about using agents, see Agents for Amazon Bedrock.