Cost optimization

The cost optimization best practices introduced in this paper are represented by at least of one of the following principles:

Optimize model and inference selection: Choose foundation models and inference approaches that align with your actual performance requirements and avoid over-provisioning. By carefully evaluating model size, accuracy needs, and inference paradigms, you can reduce operational costs while maintaining necessary quality levels. This principle helps you avoid paying for excess capacity or capabilities that don't deliver proportional business value.
Control resource consumption parameters: Implement strict controls over the variables that directly affect usage costs in generative AI systems. By managing prompt lengths, response sizes, and vector dimensions, you can minimize token usage and storage requirements while meeting the required functionality. This approach allows you to maintain cost efficiency at the operational level while preserving essential system capabilities.
Design workflow boundaries: Establish clear limits and exit conditions for generative AI processes to avoid runaway resource consumption. By implementing stopping conditions and monitoring execution patterns, you can avoid scenarios where workflows consume excessive resources or continue beyond their useful purpose. This helps you predict cost and avoid unexpected budget overruns.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

GENPERF04-BP02 Optimize vector sizes for your use case

Model selection and cost optimization