Implementation plan Documents Blogs Videos

MLSUS-11: Align SLAs with sustainability goals

Define service level agreements (SLAs) that support your sustainability goals while meeting your business requirements. Define SLAs to meet your business requirements, not exceed them. Make trade-offs that significantly reduce environmental impacts in exchange for acceptable decreases in service levels.

Implementation plan

Queue incoming requests and process them asynchronously - If your users can tolerate some latency, deploy your model on serverless or asynchronous endpoints to reduce resources that are idle between tasks and minimize the impact of load spikes. These options will automatically scale the instance or endpoint count to zero when there are no requests to process, so you only maintain an inference infrastructure when your endpoint is processing requests.
Adjust availability - If your users can tolerate some latency in the rare case of a failover, don't provision extra capacity. If an outage occurs or an instance fails, Amazon SageMaker AI automatically attempts to distribute your instances across Availability Zones. Adjusting availability is an example of a conscious trade off you can make to meet your sustainability targets.
Adjust response time - When you don't need real-time inference, use SageMaker AI Batch Transform. Unlike a persistent endpoint, clusters are decommissioned when batch transform jobs finish so you don't continuously maintain an inference infrastructure.

Documents

Blogs

Optimize AI/ML workloads for sustainability: Part 3, deployment and monitoring

Videos

AWS re:Invent 2021 - Architecting for sustainability - Optimize capacity for Sustainability

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Sustainability pillar best practices

MLSUS-12: Use efficient silicon