Understanding intelligent prompt routing in Amazon Bedrock
Note
Intelligent prompt routing in Amazon Bedrock is in preview and is subject to change.
Amazon Bedrock intelligent prompt routing provides a single serverless endpoint to efficiently route requests between different foundational models within the same model family. It can dynamically predict the response quality of each model for each request, and then route the request to the model with the best response quality. This helps to optimize for both response quality and cost.
Topics
Benefits
-
Optimized Response Quality and Cost: Routes prompts to different foundational models to achieve the best response quality at the lowest cost.
-
Simplified Management: Eliminates the need for complex orchestration logic.
-
Future-Proof: Incorporates new models as they become available.
Default and configured prompt routers
When using intelligent prompt routing, you can either use the default prompt routers provided by Amazon Bedrock, or configure your own prompt routers.
Default prompt routers are pre-configured routing systems provided by Amazon Bedrock. These routers come with predefined settings and are designed to work out-of-the-box with specific foundational models. They provide a straightforward, ready-to-use solution without needing to configure any routing settings. When starting with intelligent prompt routing, we recommend that you experiment using the default routers provided by Amazon Bedrock. During preview, you can choose to use select models in the Anthropic and Meta families.
Configured prompt routers enable you to define your own routing configurations tailored to specific needs and preferences. They are more suitable when you require more control over how to route your requests and which models to use. Configured routers enable optimization based on response quality metrics and use cases. After you've experimented with default routers, you can configure your own routers that are suitable to your applications, evaluate the response quality in the playground, and use for production applications if it meets the requirements.
Considerations and limitations
The following are considerations and limitations for intelligent prompt routing in Amazon Bedrock.
-
Intelligent prompt routing is only optimized for English prompts.
-
Intelligent prompt routing can’t adjust routing decisions or responses based on application-specific performance data.
-
Intelligent prompt routing might not always provide the most optimal routing for unique or specialized use cases. How effective the routing is depends on the initial training data.
Prompt router criteria and fallback model
When configuring your prompt routers, you can specify the routing criteria, which is used to determine which model to select for processing a request based on the response quality difference. Use this criteria to determine how much closer the responses of the fallback model should be to the responses of the other models.
Fallback models
Choose a fallback model that works well for your requests. This model serves as a reliable baseline. You can then choose another model to either improve accuracy or reduce costs compared to the fallback model. The fallback model acts as an anchor, and the routing criteria determines when to switch to the other model based on the response quality difference.
Response quality difference
The response quality difference measures the disparity between the responses of the fallback model and the other models. A smaller value indicates that the responses are similar. A higher value indicates a significant difference in the responses between the fallback model and the other models.
For example, a response quality difference of 10% means that, say the response quality of the fallback model, Claude Haiku3, is 10%, then the router will switch to another model, say Claude Sonnet3, only if its responses are 10% better than Claude Haiku3's responses.
How intelligent prompt routing works
-
Model selection and router configuration
Choose the family of models you want to use for your application. If you're using default prompt routers, you can choose from models in the Anthropic or Meta families. If you're using configured prompt routers, you can choose from additional models and configure the routing criteria. For more information, see How to use intelligent prompt routing.
-
Incoming request analyis
For each incoming request, the system analyzes the prompt to understand its content and context.
-
Response quality prediction
Amazon Bedrock predicts the response quality of each specified model in the chosen family based on the prompt. If you configured your prompt router, it takes into account the routing criteria, which is the response quality difference, and routes requests to your specified fallback model if the criteria is not met.
-
Model selection and request forwarding
Based on the response quality prediction, Amazon Bedrock dynamically chooses the model that offers the best combination of response quality and cost for the specific request. The request is then forwarded to the chosen model for processing.
-
Response handling
The response from the chosen model is retrieved and returned to the user. The response includes information about the model that was used to process the request.
How to use intelligent prompt routing
To get started with intelligent prompt routing, use the Amazon Bedrock console, AWS CLI, or AWS SDK.
Note
To best utilize intelligent prompt routing, you should regularly review performance to take advantage of new models. To optimize your usage, monitor the available performance and cost metrics.
The following sections show you how to use this feature from the console and the CLI. After you configure your prompt router, Amazon Bedrock will perform the steps described in How intelligent prompt routing works to generate a response from one of the models in the chosen router.