Submit prompts and generate responses with model inference
Inference refers to the process of generating an output from an input provided to a model. Foundation models use probability to construct the words in a sequence. Given an input, the model predicts a probable sequence of tokens that follows, and returns that sequence as the output. Amazon Bedrock provides you the capability of running inference in the foundation model of your choice. When you run inference, you provide the following inputs.
-
Prompt – An input provided to the model in order for it to generate a response. For information about writing prompts, see Prompt engineering concepts. For information about protecting against prompt injection attacks, see Prompt injection security.
-
Inference parameters – A set of values that can be adjusted to limit or influence the model response. For information about inference parameters, see Influence response generation with inference parameters and Inference request parameters and response fields for foundation models.
Amazon Bedrock offers a suite of foundation models that you can use to generate outputs of the following modalities. To see modality support by foundation model, refer to Supported foundation models in Amazon Bedrock.
Output modality | Description | Example use cases |
---|---|---|
Text | Provide text input and generate various types of text | Chat, question-and-answering, brainstorming, summarization, code generation, table creation, data formatting, rewriting |
Image | Provide text or input images and generate or modify images | Image generation, image editing, image variation |
Embeddings | Provide text, images, or both text and images and generate a vector of numeric values that represent the input. The output vector can be compared to other embeddings vectors to determine semantic similarity (for text) or visual similarity (for images). | Text and image search, query, categorization, recommendations, personalization, knowledge base creation |
When you run inference, you specify the level of throughput to use by selecting a throughput in the console or by specifying the throughput in the modelId
field in an API request. Throughput defines the number and rate of input and output tokens that you can process. For more information, see Increase throughput for resiliency and processing power.
You can run model inference in the following ways.
-
Use any of the Playgrounds to run inference in a user-friendly graphical interface.
Use the Converse API (Converse and ConverseStream) to implement conversational applications.
-
Send an InvokeModel or InvokeModelWithResponseStream request.
-
Prepare a dataset of prompts with your desired configurations and run batch inference with a
CreateModelInvocationJob
request. -
The following Amazon Bedrock features use model inference as a step in a larger orchestration. Refer to those sections for more details.
-
Set up a knowledge base and send a RetrieveAndGenerate request.
-
Set up an agent and send an InvokeAgent request.
-
You can run inference with base models, custom models, or provisioned models. To run inference on a custom model, first purchase Provisioned Throughput for it (for more information, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock).
Use these methods to test foundation model responses with different prompts and inference parameters. Once you have sufficiently explored these methods, you can set up your application to run model inference by calling these APIs.
Select a topic to learn more about running model inference through that method. To learn more about using agents, see Automate tasks in your application using conversational agents.
Topics
- Influence response generation with inference parameters
- Generate responses in a visual interface using playgrounds
- Submit a single prompt with the InvokeModel API operations
- Carry out a conversation with the Converse API operations
- Use a tool to complete an Amazon Bedrock model response
- Process multiple prompts with batch inference