MLSUS-11: Align SLAs with sustainability goals
Define service level agreements (SLAs) that support your sustainability goals while meeting your business requirements. Define SLAs to meet your business requirements, not exceed them. Make trade-offs that significantly reduce environmental impacts in exchange for acceptable decreases in service levels.
Implementation plan
-
Queue incoming requests and process them asynchronously - If your users can tolerate some latency, deploy your model on serverless or asynchronous endpoints to reduce resources that are idle between tasks and minimize the impact of load spikes. These options will automatically scale the instance or endpoint count to zero when there are no requests to process, so you only maintain an inference infrastructure when your endpoint is processing requests.
-
Adjust availability - If your users can tolerate some latency in the rare case of a failover, don't provision extra capacity. If an outage occurs or an instance fails, Amazon SageMaker automatically attempts to distribute your instances across Availability Zones. Adjusting availability is an example of a conscious trade off you can make to meet your sustainability targets.
-
Adjust response time - When you don't need real-time inference, use SageMaker Batch Transform. Unlike a persistent endpoint, clusters are decommissioned when batch transform jobs finish so you don't continuously maintain an inference infrastructure.
Documents
Blogs
Videos
-
AWS re:Invent 2021 - Architecting for sustainability
- Optimize capacity for Sustainability