GENPERF01-BP02 Collect performance metrics from generative AI workloads
Foundation model performance on specific tasks is measured in many different ways. It is important to measure and discern the performance of a model over time when selecting foundation models for generative AI workloads.
Desired outcome: When implemented, your organization improves its ability to evaluate model performance.
Benefits of establishing this best practice: Experiment more often - Testing model performance assists in the selection of foundation models for generative AI workloads.
Level of risk exposed if this best practice is not established: Medium
Implementation guidance
Consider introducing a centralized logging and monitoring solution for generative AI workloads. For example, Amazon CloudWatch integrates directly with other AWS services like Amazon Bedrock, the Amazon Q family of services, and Amazon SageMaker AI Inference Endpoints. By configuring Amazon CloudWatch or similar, customers collect performance metrics from model endpoints. These metrics can be used to develop and prioritize a list of roadmap improvements to generative AI solutions.
Performance metrics should also be collected by applications and services that interact with model endpoints and other generative AI services. Collect metrics and application traces pertaining to the flow of information, rather than just a specific piece of the workflow. Use Amazon CloudWatch or similar to determine how your entire application performs when interacting with generative AI solutions. This can help you triage performance concerns faster and improve resolution times.
Implementation steps
-
Identify and collect CloudWatch metrics.
-
Implement a trace framework like OpenLLMetry
to capture additional metrics.
-
-
Establish reasonable alarm thresholds, and set alerts to go off when those thresholds are breached.
-
Determine the remediation action for the alarm.
-
Infrastructure alarms may require horizontal scaling to remediate any issues.
-
Model alarms may inform a re-examination of the model selection process.
-
-
Automate resolution actions where possible.
Resources
Related practices:
Related guides, videos, and documentation:
Related examples:
Related tools: