Run model inference
Inference refers to the process of generating an output from an input provided to a model. Foundation models use probability to construct the words in a sequence. Given an input, the model predicts a probable sequence of tokens that follows, and returns that sequence as the output. Amazon Bedrock provides you the capability of running inference in the foundation model of your choice. When you run inference, you provide the following inputs.
-
Prompt – An input provided to the model in order for it to generate a response. For information about writing prompts, see Prompt engineering guidelines.
-
Inference parameters – A set of values that can be adjusted to limit or influence the model response. For information about inference parameters, see Inference parameters and Inference parameters for foundation models.
Amazon Bedrock offers a suite of foundation models that you can use to generate outputs of the following modalities. To see modality support by foundation model, refer to Supported foundation models in Amazon Bedrock.
Output modality | Description | Example use cases |
---|---|---|
Text | Provide text input and generate various types of text | Chat, question-and-answering, brainstorming, summarization, code generation, table creation, data formatting, rewriting |
Image | Provide text or input images and generate or modify images | Image generation, image editing, image variation |
Embeddings | Provide text, images, or both text and images and generate a vector of numeric values that represent the input. The output vector can be compared to other embeddings vectors to determine semantic similarity (for text) or visual similarity (for images). | Text and image search, query, categorization, recommendations, personalization, knowledge base creation |
You can run model inference in the following ways.
-
Use any of the Playgrounds to run inference in a user-friendly graphical interface.
-
Send an InvokeModel or InvokeModelWithResponseStream request.
-
Prepare a dataset of prompts with your desired configurations and run batch inference with a
CreateModelInvocationJob
request. -
The following Amazon Bedrock features use model inference as a step in a larger orchestration. Refer to those sections for more details.
-
Set up a knowledge base and send a RetrieveAndGenerate request.
-
Set up an agent and send an InvokeAgent request.
-
You can run inference with base models, custom models, or provisioned models. To run inference on a custom model, first purchase Provisioned Throughput for it (for more information, see Provisioned Throughput for Amazon Bedrock).
Use these methods to test foundation model responses with different prompts and inference parameters. Once you have sufficiently explored these methods, you can set up your application to run model inference by calling these APIs.
Select a topic to learn more about running model inference through that method. To learn more about using agents, see Agents for Amazon Bedrock.