Amazon SageMaker HyperPod Amazon Machine Images (AMIs) are specialized machine images for distributed machine learning workloads and high-performance computing. These AMIs enhance base images with essential components including GPU drivers and AWS Neuron accelerator support.
Key components added to HyperPod AMIs include:
-
Advanced orchestration tools:
-
Cluster management dependencies
-
Built-in resiliency features:
cluster health check
auto-resume capabilities
-
Support for HyperPod cluster management and configuration
These enhancements are built upon the following base Deep Learning AMIs (DLAMIs):
-
AWS Deep Learning AMIs Base GPU AMI (Ubuntu 20.04)
for orchestration with Slurm. -
Amazon Linux 2 based AMI for orchestration with Amazon EKS.
Choose your HyperPod AMIs based on your orchestration preference:
-
For Slurm orchestration, see SageMaker HyperPod AMI releases for Slurm.
-
For Amazon EKS orchestration, see SageMaker HyperPod AMI releases for Amazon EKS.
For information about Amazon SageMaker HyperPod feature releases, see Amazon SageMaker HyperPod release notes.