Note
Intelligent prompt routing in Amazon Bedrock is in preview and is subject to change.
Amazon Bedrock intelligent prompt routing provides a single serverless endpoint to efficiently route requests between different foundational models within the same model family. It can dynamically predict the response quality of each model for each request, and then route the request to the model with the best response quality. This helps to optimize for both response quality and cost.
Topics
Benefits
-
Optimized Response Quality and Cost: Routes prompts to different foundational models to achieve the best response quality at the lowest cost.
-
Simplified Management: Eliminates the need for complex orchestration logic.
-
Future-Proof: Incorporates new models as they become available.
Default and configured prompt routers
When using intelligent prompt routing, you can either use the default prompt routers provided by Amazon Bedrock, or configure your own prompt routers.
Default prompt routers are pre-configured routing systems provided by Amazon Bedrock. These routers come with predefined settings and are designed to work out-of-the-box with specific foundational models. They provide a straightforward, ready-to-use solution without needing to configure any routing settings. When starting with IPR, we recommend that you experiment using the default routers provided by Amazon Bedrock. During preview, you can choose to use select models in the Anthropic and Meta families.
Configured prompt routers enable you to define your own routing configurations tailored to specific needs and preferences. They are more suitable when you require more control over how to route your requests and which models to use. Configured routers enable optimization based on response quality metrics and use cases. After you've experimented with default routers, you can configure your own routers that are suitable to your applications, evaluate the response quality in the playground, and use for production applications if it meets the requirements.
Considerations and limitations
The following are considerations and limitations for intelligent prompt routing in Amazon Bedrock.
-
Intelligent prompt routing is only optimized for English prompts.
-
Intelligent prompt routing can’t adjust routing decisions or responses based on application-specific performance data.
-
Intelligent prompt routing might not always provide the most optimal routing for unique or specialized use cases. How effective the routing is depends on the initial training data.
Prompt router criteria and fallback
model
When configuring your prompt routers, you can specify the routing criteria, which is used to determine which model to select for processing a request based on the response quality difference. Use this criteria to determine how much closer the responses of the fallback model should be to the responses of the other models.
Fallback models
Choose a fallback model that works well for your requests. This model serves as a reliable baseline. You can then choose another model to either improve accuracy or reduce costs compared to the fallback model. The fallback model acts as an anchor, and the routing criteria determines when to switch to the other model based on the response quality difference.
Response quality difference
The response quality difference measures the disparity between the responses of the fallback model and the other models. A smaller value indicates that the responses are similar. A higher value indicates a significant difference in the responses between the fallback model and the other models.
For example, a response quality difference of 10% means that, say the response quality of the fallback model, Claude Haiku3, is 10%, then the router will switch to another model, say Claude Sonnet3, only if its responses are 10% better than Claude Haiku3's responses.
How intelligent prompt routing works
-
Model selection and router configuration
Choose the family of models you want to use for your application. If you're using default prompt routers, you can choose from models in the Anthropic or Meta families. If you're using configured prompt routers, you can choose from additional models and configure the routing criteria. For more information, see How to use intelligent prompt routing.
-
Incoming request analyis
For each incoming request, the system analyzes the prompt to understand its content and context.
-
Response quality prediction
Amazon Bedrock predicts the response quality of each specified model in the chosen family based on the prompt. If you configured your prompt router, it takes into account the routing criteria, which is the response quality difference, and routes requests to your specified fallback model if the criteria is not met.
-
Model selection and request forwarding
Based on the response quality prediction, Amazon Bedrock dynamically chooses the model that offers the best combination of response quality and cost for the specific request. The request is then forwarded to the chosen model for processing.
-
Response handling
The response from the chosen model is retrieved and returned to the user. The response includes information about the model that was used to process the request.
How to use intelligent prompt routing
To get started with intelligent prompt routing, use the Amazon Bedrock console, AWS CLI, or AWS SDK.
Note
To best utilize intelligent prompt routing, you should regularly review performance to take advantage of new models. To optimize your usage, monitor the available performance and cost metrics.
The following sections show you how to use this feature from the console and the CLI. After you configure your prompt router, Amazon Bedrock will perform the steps described in How intelligent prompt routing works to generate a response from one of the models in the chosen router.
To use intelligent prompt routing from the AWS Management Console:
-
Go to the Prompt Routers
hub in the Amazon Bedrock console. Use your AWS credentials to log in to the console. -
Choose the model family that you want to use. If you're using the feature for the first time, you can experiment with the default prompt routers. During the preview, you can choose from models in the Anthropic or Meta families. You can then open the playground and experiment with your prompts.
Note
You must choose exactly two models within the same family.
-
After you've experimented with the default routers, you can configure your router. You must provide a unique name for your router and an optional description.
-
Define the routing rules for routing the prompts to different models. Specify the models for routing, the routing criteria, and a fallback model to be used in case the routing criteria is not met.
-
You can now open the playground and try different prompts to monitor the performance of your prompt router. By analyzing how prompts are being routed and the performance of each model, you can adjust the routing criteria and models as needed to optimize for performance and cost.