MLSUS-12: Use efficient silicon - Machine Learning Lens

MLSUS-12: Use efficient silicon

Use the most efficient instance type compatible with your ML workload.

Implementation plan

AWS offers several purpose-built compute architectures that are optimized to minimize the sustainability impact of ML workloads:

  • For CPU-based ML inference, use AWS Graviton3 - These processors offer the best performance per watt in Amazon EC2. They use up to 60% less energy than comparable EC2 instances. Graviton3 processors deliver up to three times better performance compared to Graviton2 processors for ML workloads, and they support bfloat16.

  • For deep learning inference, use AWS Inferentia - The Amazon EC2 Inf2 instances offer up to 50% better performance/watt over comparable Amazon EC2 instances because they and the underlying Inferentia2 accelerators are purpose built to run DL models at scale. Inf2 instances help you meet your sustainability goals when deploying ultra-large models.

  • For training, use AWS Trainium - The Amazon EC2 trn1 instances based on the custom designed AWS Trainium chips offer up to 50% cost-to-train savings over comparable Amazon EC2 instances. When using a Trainium-based instance cluster, the total energy consumption for training BERT Large from scratch is approximately 25% lower compared to a same-sized cluster of comparable accelerated EC2 instances.

Documents

Blogs