MLSUS-12: Use efficient silicon
Use the most efficient instance type compatible with your ML workload.
Implementation plan
AWS offers several purpose-built compute architectures that are optimized to minimize the sustainability impact of ML workloads:
-
For CPU-based ML inference, use AWS Graviton3 - These processors offer the best performance per watt in Amazon EC2. They use up to 60% less energy than comparable EC2 instances. Graviton3
processors deliver up to three times better performance compared to Graviton2 processors for ML workloads, and they support bfloat16 . -
For deep learning inference, use AWS Inferentia - The Amazon EC2 Inf2 instances offer up to 50% better performance/watt over comparable Amazon EC2 instances because they and the underlying Inferentia2 accelerators
are purpose built to run DL models at scale. Inf2 instances help you meet your sustainability goals when deploying ultra-large models. -
For training, use AWS Trainium - The Amazon EC2 trn1 instances based on the custom designed AWS Trainium
chips offer up to 50% cost-to-train savings over comparable Amazon EC2 instances. When using a Trainium-based instance cluster, the total energy consumption for training BERT Large from scratch is approximately 25% lower compared to a same-sized cluster of comparable accelerated EC2 instances.