GENCOST03-BP02 Control model response length

The costs of a foundation model are often measured in the lengths of the model's responses. This best practice describes how to control model responses to reduce costs.

Desired outcome: When implemented, this best practices encourages model responses to be as short as possible without sacrificing usability.

Benefits of establishing this best practice: Adopt a consumption model - Foundation models on a consumption based pricing model charge by the token. Reducing model response length has the effect of reducing the cost of inference.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

Model response length should be kept as concise as possible, so long as it satisfies the use case. In Amazon Bedrock, consider specifying a response length hyperparameter to control and predict the upper-limit of the response length. Additionally, you may consider adding a phrase to your prompts which encourages the model to be succinct, further reducing the length of the model's response while encouraging the model to maintain a high degree of performance. Small optimizations in token count for model responses can improve model's generated output cost.

Implementation steps

Understand how the model response is to be used, defined a minimalist response scheme (for example, 0 for affirmative and 1 for rejection).
Inform the model in the prompt of the requested model response scheme, and ask the model to respond in kind.
Set a hard limit on the response length by configuring the response length hyperparameter accordingly.
Continue testing and optimizing the model's response to verify it satisfies the workload requirements.

Resources

Related practices:

COST10-BP01

Related guides, videos, and documentation:

AWS re:Invent 2023 - Prompt Engineering Best Practices for LLMs on Amazon Bedrock (AIM377)

Related examples:

Amazon Bedrock Prompt Management is now Available in GA

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

GENCOST03-BP01 Reduce prompt token length

Cost-informed vector stores