Submit prompts and generate responses with model inference

Inference refers to the process of generating an output from an input provided to a model. Foundation models use probability to construct the words in a sequence. Given an input, the model predicts a probable sequence of tokens that follows, and returns that sequence as the output. Amazon Bedrock provides you the capability of running inference in the foundation model of your choice. When you run inference, you provide the following inputs.

Prompt – An input provided to the model in order for it to generate a response. For information about writing prompts, see Prompt engineering concepts. For information about protecting against prompt injection attacks, see Prompt injection security.
Inference parameters – A set of values that can be adjusted to limit or influence the model response. For information about inference parameters, see Influence response generation with inference parameters and Inference request parameters and response fields for foundation models.

Amazon Bedrock offers a suite of foundation models that you can use to generate outputs of the following modalities. To see modality support by foundation model, refer to Supported foundation models in Amazon Bedrock.

Output modality	Description	Example use cases
Text	Provide text input and generate various types of text	Chat, question-and-answering, brainstorming, summarization, code generation, table creation, data formatting, rewriting
Image	Provide text or input images and generate or modify images	Image generation, image editing, image variation
Embeddings	Provide text, images, or both text and images and generate a vector of numeric values that represent the input. The output vector can be compared to other embeddings vectors to determine semantic similarity (for text) or visual similarity (for images).	Text and image search, query, categorization, recommendations, personalization, knowledge base creation

When you run inference, you specify the level of throughput to use by selecting a throughput in the console or by specifying the throughput in the modelId field in an API request. Throughput defines the number and rate of input and output tokens that you can process. For more information, see Increase throughput for resiliency and processing power.

You can run model inference in the following ways.

Use any of the Playgrounds to run inference in a user-friendly graphical interface.
Use the Converse API (Converse and ConverseStream) to implement conversational applications.
Send an InvokeModel or InvokeModelWithResponseStream request.
Prepare a dataset of prompts with your desired configurations and run batch inference with a CreateModelInvocationJob request.
The following Amazon Bedrock features use model inference as a step in a larger orchestration. Refer to those sections for more details.
- Set up a knowledge base and send a RetrieveAndGenerate request.
- Set up an agent and send an InvokeAgent request.

You can run inference with base models, custom models, or provisioned models. To run inference on a custom model, first purchase Provisioned Throughput for it (for more information, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock).

Use these methods to test foundation model responses with different prompts and inference parameters. Once you have sufficiently explored these methods, you can set up your application to run model inference by calling these APIs.

Select a topic to learn more about running model inference through that method. To learn more about using agents, see Automate tasks in your application using conversational agents.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Model lifecycle

Influence response generation with inference parameters