SageMaker HyperPod AMI releases for
Amazon EKS
The following release notes track the latest updates for Amazon SageMaker HyperPod AMI releases
for Amazon EKS orchestration. Each release note includes a summarized list of packages
pre-installed or pre-configured in the SageMaker HyperPod DLAMIs for Amazon EKS support. Each DLAMI
is built on Amazon Linux 2 (AL2) and supports a specific Kubernetes version. For
HyperPod DLAMI releases for Slurm orchestration, see SageMaker HyperPod AMI releases for
Slurm. For information about
Amazon SageMaker HyperPod feature releases, see Amazon SageMaker HyperPod release notes.
SageMaker HyperPod AMI
releases for Amazon EKS: August 25, 2025
SageMaker HyperPod DLAMI for Amazon EKS support
This release includes the following updates:
- Kubernetes v1.28
-
NVIDIA SMI:
Added Packages:
Updated Packages:
gdk-pixbuf2.x86_64: 2.36.12-3.amzn2 → 2.36.12-3.amzn2.0.2
kernel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-devel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-headers.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-tools.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
libgs.x86_64: 9.54.0-9.amzn2.0.11 → 9.54.0-9.amzn2.0.12
microcode_ctl.x86_64: 2:2.1-47.amzn2.4.24 → 2:2.1-47.amzn2.4.25
pam.x86_64: 1.1.8-23.amzn2.0.2 → 1.1.8-23.amzn2.0.4
Removed Packages:
Repository Changed:
libnvidia-container-tools.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
libnvidia-container1.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit-base.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
- Kubernetes v1.29
-
NVIDIA SMI:
Added Packages:
Updated Packages:
gdk-pixbuf2.x86_64: 2.36.12-3.amzn2 → 2.36.12-3.amzn2.0.2
kernel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-devel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-headers.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-tools.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
libgs.x86_64: 9.54.0-9.amzn2.0.11 → 9.54.0-9.amzn2.0.12
microcode_ctl.x86_64: 2:2.1-47.amzn2.4.24 → 2:2.1-47.amzn2.4.25
pam.x86_64: 1.1.8-23.amzn2.0.2 → 1.1.8-23.amzn2.0.4
Removed Packages:
Repository Changed:
libnvidia-container-tools.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
libnvidia-container1.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit-base.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
- Kubernetes v1.30
-
NVIDIA SMI:
Added Packages:
Updated Packages:
aws-neuronx-dkms.noarch: 2.22.2.0-dkms → 2.23.9.0-dkms
efa.x86_64: 2.15.3-1.amzn2 → 2.17.2-1.amzn2
efa-nv-peermem.x86_64: 1.2.1-1.amzn2 → 1.2.2-1.amzn2
gdk-pixbuf2.x86_64: 2.36.12-3.amzn2 → 2.36.12-3.amzn2.0.2
ibacm.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
infiniband-diags.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
kernel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-devel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-headers.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-tools.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
libfabric-aws.x86_64: 2.1.0amzn3.0-1.amzn2 → 2.1.0amzn5.0-1.amzn2
libfabric-aws-devel.x86_64: 2.1.0amzn3.0-1.amzn2 → 2.1.0amzn5.0-1.amzn2
libgs.x86_64: 9.54.0-9.amzn2.0.11 → 9.54.0-9.amzn2.0.12
libibumad.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
libibverbs.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
libibverbs-core.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
libibverbs-utils.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
libnccl-ofi.x86_64: 1.15.0-1.amzn2 → 1.16.2-1.amzn2
librdmacm.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
librdmacm-utils.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
microcode_ctl.x86_64: 2:2.1-47.amzn2.4.24 → 2:2.1-47.amzn2.4.25
pam.x86_64: 1.1.8-23.amzn2.0.2 → 1.1.8-23.amzn2.0.4
rdma-core.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
rdma-core-devel.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
Removed Packages:
Repository Changed:
libnvidia-container-tools.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
libnvidia-container1.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit-base.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
- Kubernetes v1.31
-
NVIDIA SMI:
Added Packages:
Updated Packages:
gdk-pixbuf2.x86_64: 2.36.12-3.amzn2 → 2.36.12-3.amzn2.0.2
kernel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-devel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-headers.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-tools.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
libgs.x86_64: 9.54.0-9.amzn2.0.11 → 9.54.0-9.amzn2.0.12
microcode_ctl.x86_64: 2:2.1-47.amzn2.4.24 → 2:2.1-47.amzn2.4.25
pam.x86_64: 1.1.8-23.amzn2.0.2 → 1.1.8-23.amzn2.0.4
Removed Packages:
Repository Changed:
libnvidia-container-tools.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
libnvidia-container1.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit-base.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
- Kubernetes v1.32
-
NVIDIA SMI:
Added Packages:
Updated Packages:
aws-neuronx-dkms.noarch: 2.22.2.0-dkms → 2.23.9.0-dkms
efa.x86_64: 2.15.3-1.amzn2 → 2.17.2-1.amzn2
efa-nv-peermem.x86_64: 1.2.1-1.amzn2 → 1.2.2-1.amzn2
gdk-pixbuf2.x86_64: 2.36.12-3.amzn2 → 2.36.12-3.amzn2.0.2
ibacm.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
infiniband-diags.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
kernel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-devel.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-headers.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
kernel-tools.x86_64: 5.10.239-236.958.amzn2 → 5.10.240-238.955.amzn2
libfabric-aws.x86_64: 2.1.0amzn3.0-1.amzn2 → 2.1.0amzn5.0-1.amzn2
libfabric-aws-devel.x86_64: 2.1.0amzn3.0-1.amzn2 → 2.1.0amzn5.0-1.amzn2
libgs.x86_64: 9.54.0-9.amzn2.0.11 → 9.54.0-9.amzn2.0.12
libibumad.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
libibverbs.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
libibverbs-core.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
libibverbs-utils.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
libnccl-ofi.x86_64: 1.15.0-1.amzn2 → 1.16.2-1.amzn2
librdmacm.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
librdmacm-utils.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
microcode_ctl.x86_64: 2:2.1-47.amzn2.4.24 → 2:2.1-47.amzn2.4.25
pam.x86_64: 1.1.8-23.amzn2.0.2 → 1.1.8-23.amzn2.0.4
rdma-core.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
rdma-core-devel.x86_64: 57.amzn1-1.amzn2.0.2 → 58.amzn0-1.amzn2.0.2
Removed Packages:
Repository Changed:
libnvidia-container-tools.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
libnvidia-container1.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
nvidia-container-toolkit-base.x86_64: cuda-rhel8-x86_64 → nvidia-container-toolkit
SageMaker HyperPod AMI
releases for Amazon EKS: August 6, 2025
SageMaker HyperPod DLAMI for Amazon EKS support
The AMIs include the following updates:
- K8s v1.28
-
-
Neuron packages:
-
aws-neuronx-collectives:
2.27.34.0_ec8cd5e8b-1
-
aws-neuronx-dkms:
2.23.9.0-dkms
-
aws-neuronx-runtime-lib:
2.27.23.0_8deec4dbf-1
-
aws-neuronx-k8-plugin: 2.27.7.0-1
-
aws-neuronx-k8-scheduler:
2.27.7.0-1
-
aws-neuronx-tools:
2.25.145.0-1
- K8s v1.29
-
-
Neuron packages:
-
aws-neuronx-collectives:
2.27.34.0_ec8cd5e8b-1
-
aws-neuronx-dkms:
2.23.9.0-dkms
-
aws-neuronx-runtime-lib:
2.27.23.0_8deec4dbf-1
-
aws-neuronx-k8-plugin: 2.27.7.0-1
-
aws-neuronx-k8-scheduler:
2.27.7.0-1
-
aws-neuronx-tools:
2.25.145.0-1
- K8s v1.30
-
-
Neuron packages:
-
aws-neuronx-collectives:
2.27.34.0_ec8cd5e8b-1
-
aws-neuronx-dkms:
2.23.9.0-dkms
-
aws-neuronx-runtime-lib:
2.27.23.0_8deec4dbf-1
-
aws-neuronx-k8-plugin: 2.27.7.0-1
-
aws-neuronx-k8-scheduler:
2.27.7.0-1
-
aws-neuronx-tools:
2.25.145.0-1
- K8s v1.31
-
-
Neuron packages:
-
aws-neuronx-collectives:
2.27.34.0_ec8cd5e8b-1
-
aws-neuronx-dkms:
2.23.9.0-dkms
-
aws-neuronx-runtime-lib:
2.27.23.0_8deec4dbf-1
-
aws-neuronx-k8-plugin: 2.27.7.0-1
-
aws-neuronx-k8-scheduler:
2.27.7.0-1
-
aws-neuronx-tools:
2.25.145.0-1
- K8s v1.32
-
-
Neuron packages:
-
aws-neuronx-collectives:
2.27.34.0_ec8cd5e8b-1
-
aws-neuronx-dkms:
2.23.9.0-dkms
-
aws-neuronx-runtime-lib:
2.27.23.0_8deec4dbf-1
-
aws-neuronx-k8-plugin: 2.27.7.0-1
-
aws-neuronx-k8-scheduler:
2.27.7.0-1
-
aws-neuronx-tools:
2.25.145.0-1
-
Deep Learning Base OSS Nvidia Driver AMI (Amazon Linux 2) Version
70.3
-
Deep Learning Base Proprietary Nvidia Driver AMI (Amazon Linux 2)
Version 68.4
-
Latest CUDA 12.8 support
-
Upgraded Nvidia Driver to from 570.158.01 to 570.172.08 to fix CVE's
present in the Nvidia Security Bulletin for July
SageMaker HyperPod AMI
releases for Amazon EKS: July 31, 2025
Amazon SageMaker HyperPod now supports a new AMI for Amazon EKS clusters that updates the base operating system to Amazon Linux 2023.
This release provides several improvements from Amazon Linux 2 (AL2). HyperPod releases new AMIs regularly, and we recommend
that you run all of your HyperPod clusters on the latest and most secure versions of AMIs to address vulnerabilities and phase
out outdated software and libraries.
Key upgrades
-
Operating System: Amazon Linux 2023 (updated from Amazon Linux 2, or AL2)
-
Package Manager: DNF is the default package management tool, replacing YUM used in AL2
-
Networking Service: systemd-networkd
manages network interfaces, replacing ISC
dhclient
used in AL2
-
Linux Kernel: Version 6.1, updated from the kernel used in AL2
-
Glibc: Version 2.34, updated from the version in AL2
-
GCC: Version 11.5.0, updated from the
version in AL2
-
NFS: Version 1:2.6.1, updated from version 1:1.3.4 in AL2
-
NVIDIA Driver: Version 570.172.08, a
newer driver version
-
Python: Version 3.9, replacing Python
2.7 used in AL2
-
NVME: Version 1.11.1, a newer version of the NVMe driver
Before you upgrade
There are a few important things to know before upgrading. With AL2023, several packages have been added,
upgraded or removed compared to AL2. We strongly recommend that you test your applications with AL2023 before
upgrading your clusters. For a comprehensive list of all package changes in AL2023, see Package changes in Amazon Linux 2023.
The following are some of the significant changes between AL2 and AL2023:
-
Python 3.10: The most significant update, apart from the operating system,
is the Python version upgrade. After upgrading, clusters have Python 3.10 as default. While some Python 3.8 distributed
training workloads might be compatible with Python 3.10, we strongly recommend that you test your specific workloads
separately. If migration to Python 3.10 proves challenging but you still want to upgrade your cluster for other new features,
you can install an older Python version by using the command yum install python-xx.x
with
lifecycle scripts before running any workloads.
Ensure you test both your existing lifecycle scripts and application code for compatibility.
-
NVIDIA runtime enforcement: AL2023
strictly enforces the NVIDIA container runtime requirements, causing
containers with hard-coded NVIDIA environment variables (such as
NVIDIA_VISIBLE_DEVICES: "all"
) to fail on CPU-only
nodes (whereas AL2 ignored these settings when no GPU drivers are
present). You can override the enforcement by setting
NVIDIA_VISIBLE_DEVICES: "void"
in your pod
specification or by using CPU-only images.
-
cgroup v2: AL2023 features the next generation of unified control group
hierarchy (cgroup v2). cgroup v2 is used for container runtimes and is also used by systemd
. While AL2023
still includes code that can make the system run using cgroup v1, this isn't a recommended configuration.
-
Amazon VPC CNI and eksctl
versions: AL2023 also requires your Amazon VPC
CNI version to be 1.16.2 or greater and your eksctl
version to be 0.176.0 or greater.
-
EFA on FSx for Lustre: You can now use EFA on FSx for Lustre, which
enables you to achieve application performance comparable to on-premises AI/ML or HPC (high performance computing) clusters,
while benefiting from the scalability, flexibility and elasticity of cloud computing.
Additionally, upgrading to AL2023 requires at minimum version 1.0.643.0_1.0.192.0
of Health Monitoring Agent.
Complete the following procedure to update the Health Monitoring Agent:
-
If you use HyperPod lifecycle scripts from the GitHub repository
awsome-distributed-training,
make sure to pull the latest version. Earlier versions are not compatible with AL2023.
The new lifecycle script ensures that containerd
uses the additional mounted
storage for pulling in container images in AL2023.
-
Pull in the latest version of the HyperPod CLI git repository.
-
Update dependencies with the following command: helm dependencies update helm_chart/HyperPodHelmChart
-
As mentioned on the step 4 in the README of HyperPodHelmChart,
run the following command to upgrade the version of dependencies running on the cluster:
helm upgrade dependencies helm_chart/HyperPodHelmChart -namespace kube-system
Workloads that have been tested on upgraded EKS clusters
The following are some use cases where the upgrade has been tested:
Backwards compatibility: Popular distributed training jobs
involving PyTorch should be backwards compatible on the new AMI.
However, since your workloads may depend on specific Python or Linux
libraries, we recommend first testing on a smaller scale or subset of
nodes before upgrading your larger clusters.
Accelerator testing: Jobs across various instance types, utilizing both NVIDIA accelerators
(for the P and G instance families) and AWS Neuron accelerators (for Trn instances) have been tested.
How to upgrade your AMI and associated workloads
You can upgrade to the new AMI using one of the following methods:
The cluster is unavailable during the update process. We recommend planning for this downtime and restarting the training
workload from an existing checkpoint after the upgrade completes. As a best practice, we recommend that you perform testing
on a smaller cluster before upgrading your larger clusters.
If the update command fails, first identify the cause of the failure. For lifecycle script failures, make the necessary corrections
to your scripts and retry. For any other issues that cannot be resolved, contact
AWS Support.
Troubleshooting
Use the following section to help with troubleshooting any issues you encounter when upgrading to AL2023.
How do I fix errors such as "nvml error: driver
not loaded: unknown"
on CPU-only cluster nodes?
If containers that worked on CPU AL2 Amazon EKS nodes now fail on AL2023, your container image may have
hard-coded NVIDIA environment variables. You can check for hard-coded environment variables with the following command:
docker inspect image:tag | grep -i nvidia
AL2023 strictly enforces these requirements whereas AL2 was more lenient on
CPU-only nodes. One solution is to override the AL2023 enforcement by setting certain NVIDIA environment
variables in your Amazon EKS pod specification, as shown in the following example:
yaml
containers:
- name: your-container
image: your-image:tag
env:
- name: NVIDIA_VISIBLE_DEVICES
value: "void"
- name: NVIDIA_DRIVER_CAPABILITIES
value: ""
Another alternative is to use CPU-only container images (such as
pytorch/pytorch:latest-cpu
) or build custom images without
NVIDIA dependencies.
SageMaker HyperPod AMI
releases for Amazon EKS: July 15, 2025
SageMaker HyperPod DLAMI for Amazon EKS support
The AMIs include the following updates:
- K8s v1.28
-
- K8s v1.29
-
- K8s v1.30
-
- K8s v1.31
-
- K8s v1.32
-
SageMaker HyperPod AMI
releases for Amazon EKS: June 09, 2025
SageMaker HyperPod DLAMI for Amazon EKS support
- Neuron SDK Updates
-
SageMaker HyperPod AMI
releases for Amazon EKS: May 22, 2025
AMI general updates
SageMaker HyperPod DLAMI for Amazon EKS support
- Deep Learning Base AMI AL2
-
- Neuron SDK Updates
-
-
aws-neuronx-dkms.noarch:
2.20.74.0 (from 2.20.28.0)
-
aws-neuronx-collectives.x86_64:
2.25.65.0_9858ac9a1-1 (from 2.24.59.0_838c7fc8b-1)
-
aws-neuronx-runtime-lib.x86_64:
2.25.57.0_166c7a468-1 (from 2.24.53.0_f239092cc-1)
-
aws-neuronx-tools.x86_64:
2.23.9.0 (from 2.22.61.0)
-
aws-neuronx-gpsimd-customop-lib.x86_64:
0.15.12.0 (from 0.14.12.0)
-
aws-neuronx-gpsimd-tools.x86_64:
0.15.1.0_5d31b6a3f (from 0.14.6.0_241eb69f4)
-
aws-neuronx-k8-plugin.x86_64:
2.25.24.0 (from 2.24.23.0)
-
aws-neuronx-k8-scheduler.x86_64: 2.25.24.0 (from
2.24.23.0)
Support notes:
-
AMI components including CUDA versions may be removed or
changed based on framework support policy
-
Kernel version is pinned for compatibility. Users should avoid
updates unless required for security patches
-
For EC2 instances with multiple network cards, please refer to
EFA configuration guide for proper setup
SageMaker HyperPod AMI
releases for Amazon EKS: May 07, 2025
- Installed the latest version of AWS Neuron SDK
-
SageMaker HyperPod AMI
releases for Amazon EKS: April 28, 2025
Improvements for K8s
SageMaker HyperPod DLAMI for Amazon EKS support
- Installed the latest version of AWS Neuron SDK
-
-
aws-neuronx-dkms.noarch:
2.20.28.0-dkms
-
aws-neuronx-oci-hook.x86_64:
2.4.4.0-1
-
aws-neuronx-tools.x86_64:
2.18.3.0-1
-
aws-neuron-dkms.noarch:
2.3.26.0-dkms
-
aws-neuron-k8-plugin.x86_64:
1.9.3.0-1
-
aws-neuron-k8-scheduler.x86_64: 1.9.3.0-1
-
aws-neuron-runtime.x86_64:
1.6.24.0-1
-
aws-neuron-runtime-base.x86_64:
1.6.21.0-1
-
aws-neuron-tools.x86_64:
2.1.4.0-1
-
aws-neuronx-collectives.x86_64:
2.24.59.0_838c7fc8b-1
-
aws-neuronx-gpsimd-customop.x86_64:
0.2.3.0-1
-
aws-neuronx-gpsimd-customop-lib.x86_64:
0.14.12.0-1
-
aws-neuronx-gpsimd-tools.x86_64:
0.14.6.0_241eb69f4-1
-
aws-neuronx-k8-plugin.x86_64:
2.24.23.0-1
-
aws-neuronx-k8-scheduler.x86_64:
2.24.23.0-1
-
aws-neuronx-runtime-lib.x86_64:
2.24.53.0_f239092cc-1
-
aws-neuronx-tools.x86_64:
2.22.61.0-1
-
tensorflow-model-server-neuronx.x86_64:
2.10.1.2.12.2.0-0
SageMaker HyperPod AMI
releases for Amazon EKS: April 18, 2025
AMI general updates
SageMaker HyperPod DLAMI for Amazon EKS support
The AMIs include the following:
- Deep Learning EKS AMI 1.32.1
-
-
Amazon EKS Components
-
Kubernetes Version: 1.32.1
-
Containerd Version: 1.7.27
-
Runc Version: 1.1.14
-
AWS IAM Authenticator: 0.6.29
-
Amazon SSM Agent: 3.3.1611.0
-
Linux Kernel: 5.10.235
-
OSS Nvidia driver:
550.163.01
-
NVIDIA CUDA: 12.4
-
EFA Installer: 1.38.0
-
GDRCopy: 2.4.1-1
-
Nvidia container toolkit:
1.17.6
-
AWS OFI NCCL: 1.13.2
-
aws-neuronx-tools:
2.18.3.0
-
aws-neuronx-runtime-lib:
2.24.53.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.20.28.0
-
aws-neuronx-collectives:
2.24.59.0
SageMaker HyperPod AMI
releases for Amazon EKS: February 18, 2025
Improvements for K8s
-
Upgraded Nvidia container toolkit from version 1.17.3 to version
1.17.4.
-
Fixed the issue where customers were unable to connect to nodes after a
reboot.
-
Upgraded Elastic Fabric Adapter (EFA) version from 1.37.0 to
1.38.0.
-
The EFA now includes the AWS OFI NCCL plugin, which is located in the
/opt/amazon/ofi-nccl
directory instead of the original
/opt/aws-ofi-nccl/
path. If you need to update your
LD_LIBRARY_PATH
environment variable, make sure to modify
the path to point to the new /opt/amazon/ofi-nccl
location for
the OFI NCCL plugin.
-
Removed the emacs package from these DLAMIs. You can install emacs from
GNU emac.
SageMaker HyperPod DLAMI for Amazon EKS support
- Installed the latest version of neuron SDK
-
-
aws-neuronx-dkms.noarch:
2.19.64.0-dkms @neuron
-
aws-neuronx-oci-hook.x86_64:
2.4.4.0-1 @neuron
-
aws-neuronx-tools.x86_64:
2.18.3.0-1 @neuron
-
aws-neuronx-collectives.x86_64:
2.23.135.0_3e70920f2-1 neuron
-
aws-neuronx-gpsimd-customop.x86_64: 0.2.3.0-1
neuron
-
aws-neuronx-gpsimd-customop-lib.x86_64
-
aws-neuronx-gpsimd-tools.x86_64:
0.13.2.0_94ba34927-1 neuron
-
aws-neuronx-k8-plugin.x86_64:
2.23.45.0-1 neuron
-
aws-neuronx-k8-scheduler.x86_64: 2.23.45.0-1
neuron
-
aws-neuronx-runtime-lib.x86_64:
2.23.112.0_9b5179492-1 neuron
-
aws-neuronx-tools.x86_64:
2.20.204.0-1 neuron
-
tensorflow-model-server-neuronx.x86_64
SageMaker HyperPod AMI
releases for Amazon EKS: January 22, 2025
AMI general updates
SageMaker HyperPod DLAMI for Amazon EKS support
The AMIs include the following:
- Deep Learning EKS AMI 1.31
-
-
Amazon EKS Components
-
Kubernetes Version: 1.31.2
-
Containerd Version: 1.7.23
-
Runc Version: 1.1.14
-
AWS IAM Authenticator: 0.6.26
-
Amazon SSM Agent:
3.3.987
-
Linux Kernel: 5.10.230
-
OSS Nvidia driver:
550.127.05
-
NVIDIA CUDA: 12.4
-
EFA Installer: 1.37.0
-
GDRCopy: 2.4.1-1
-
Nvidia container toolkit:
1.17.3
-
AWS OFI NCCL: 1.13.0
-
aws-neuronx-tools:
2.18.3
-
aws-neuronx-runtime-lib:
2.23.112.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.18.20.0
-
aws-neuronx-collectives:
2.23.133.0
SageMaker HyperPod AMI
releases for Amazon EKS: December 21, 2024
SageMaker HyperPod DLAMI for Amazon EKS support
The AMIs include the following:
- K8s v1.28
-
-
Amazon EKS Components
-
Kubernetes Version: 1.28.15
-
Containerd Version: 1.7.23
-
Runc Version: 1.1.14
-
AWS IAM Authenticator: 0.6.26
-
Amazon SSM Agent:
3.3.987
-
Linux Kernel: 5.10.228
-
OSS NVIDIA driver:
550.127.05
-
NVIDIA CUDA: 12.4
-
EFA Installer: 1.37.0
-
GDRCopy: 2.4
-
NVIDIA container toolkit:
1.17.3
-
AWS OFI NCCL: 1.13.0
-
aws-neuronx-tools:
2.18.3.0-1
-
aws-neuronx-runtime-lib:
2.23.112.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.18.20.0
-
aws-neuronx-collectives:
2.23.135.0
- K8s v1.29
-
-
Amazon EKS Components
-
Kubernetes Version: 1.29.10
-
Containerd Version: 1.7.23
-
Runc Version: 1.1.14
-
AWS IAM Authenticator: 0.6.26
-
Amazon SSM Agent:
3.3.987
-
Linux Kernel: 5.15.0
-
OSS Nvidia driver:
550.127.05
-
NVIDIA CUDA: 12.4
-
EFA Installer: 1.37.0
-
GDRCopy: 2.4
-
Nvidia container toolkit:
1.17.3
-
AWS OFI NCCL: 1.13.0
-
aws-neuronx-tools:
2.18.3.0-1
-
aws-neuronx-runtime-lib:
2.23.112.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.18.20.0
-
aws-neuronx-collectives:
2.23.135.0
- K8s v1.30
-
-
Amazon EKS Components
-
Kubernetes Version: 1.30.6
-
Containerd Version: 1.7.23
-
Runc Version: 1.1.14
-
AWS IAM Authenticator: 0.6.26
-
Amazon SSM Agent:
3.3.987.0
-
Linux Kernel: 5.10.228
-
OSS Nvidia driver:
550.127.05
-
NVIDIA CUDA: 12.4
-
EFA Installer: 1.37.0
-
GDRCopy: 2.4
-
Nvidia container toolkit:
1.17.3
-
AWS OFI NCCL: 1.13.0
-
aws-neuronx-tools:
2.18.3.0-1
-
aws-neuronx-runtime-lib:
2.23.112.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.18.20.0
-
aws-neuronx-collectives:
2.23.135.0
SageMaker HyperPod AMI
releases for Amazon EKS: December 13, 2024
SageMaker HyperPod DLAMI for Amazon EKS upgrade
SageMaker HyperPod AMI
releases for Amazon EKS: November 24, 2024
AMI general updates
SageMaker HyperPod AMI
releases for Amazon EKS: November 15, 2024
SageMaker HyperPod DLAMI for Amazon EKS support
The AMIs include the following:
- Deep Learning EKS AMI 1.28
-
-
Amazon EKS Components
-
Kubernetes Version: 1.28.15
-
Containerd Version: 1.7.23
-
Runc Version: 1.1.14
-
AWS IAM Authenticator: 0.6.26
-
Amazon SSM Agent:
3.3.987
-
Linux Kernel: 5.10.228
-
OSS NVIDIA driver:
550.127.05
-
NVIDIA CUDA: 12.4
-
EFA Installer: 1.34.0
-
GDRCopy: 2.4
-
NVIDIA container toolkit:
1.17.3
-
AWS OFI NCCL: 1.11.0
-
aws-neuronx-tools:
2.18.3.0-1
-
aws-neuronx-runtime-lib:
2.22.19.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.18.20.0
-
aws-neuronx-collectives:
2.22.33.0
- Deep Learning EKS AMI 1.29
-
-
Amazon EKS Components
-
Kubernetes Version: 1.29.10
-
Containerd Version: 1.7.23
-
Runc Version: 1.1.14
-
AWS IAM Authenticator: 0.6.26
-
Amazon SSM Agent:
3.3.987
-
Linux Kernel: 5.10.228
-
OSS Nvidia driver:
550.127.05
-
NVIDIA CUDA: 12.4
-
EFA Installer: 1.34.0
-
GDRCopy: 2.4
-
Nvidia container toolkit:
1.17.3
-
AWS OFI NCCL: 1.11.0
-
aws-neuronx-tools:
2.18.3.0-1
-
aws-neuronx-runtime-lib:
2.22.19.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.18.20.0
-
aws-neuronx-collectives:
2.22.33.0
- Deep Learning EKS AMI 1.30
-
-
Amazon EKS Components
-
Kubernetes Version: 1.30.6
-
Containerd Version: 1.7.23
-
Runc Version: 1.1.14
-
AWS IAM Authenticator: 0.6.26
-
Amazon SSM Agent:
3.3.987
-
Linux Kernel: 5.10.228
-
OSS Nvidia driver:
550.127.05
-
NVIDIA CUDA: 12.4
-
EFA Installer: 1.34.0
-
GDRCopy: 2.4
-
Nvidia container toolkit:
1.17.3
-
AWS OFI NCCL: 1.11.0
-
aws-neuronx-tools:
2.18.3.0-1
-
aws-neuronx-runtime-lib:
2.22.19.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.18.20.0
-
aws-neuronx-collectives:
2.22.33.0
SageMaker HyperPod AMI
releases for Amazon EKS: November 11, 2024
AMI general updates
SageMaker HyperPod AMI
releases for Amazon EKS: October 21, 2024
AMI general updates
SageMaker HyperPod AMI
releases for Amazon EKS: September 10, 2024
SageMaker HyperPod DLAMI for Amazon EKS support
The AMIs include the following:
-
-
Amazon EKS Components
-
Kubernetes Version: 1.28.11
-
Containerd Version: 1.7.20
-
Runc Version: 1.1.11
-
AWS IAM Authenticator: 0.6.21
-
Amazon SSM Agent:
3.3.380
-
Linux Kernel: 5.10.223
-
OSS NVIDIA driver:
535.183.01
-
NVIDIA CUDA: 12.2
-
EFA Installer: 1.32.0
-
GDRCopy: 2.4
-
NVIDIA container toolkit:
1.16.1
-
AWS OFI NCCL: 1.9.1
-
aws-neuronx-tools:
2.18.3.0-1
-
aws-neuronx-runtime-lib:
2.21.41.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.17.17.0
-
aws-neuronx-collectives:
2.21.46.0
- Deep Learning EKS AMI 1.29
-
-
Amazon EKS Components
-
Kubernetes Version: 1.29.6
-
Containerd Version: 1.7.20
-
Runc Version: 1.1.11
-
AWS IAM Authenticator: 0.6.21
-
Amazon SSM Agent:
3.3.380
-
Linux Kernel: 5.10.223
-
OSS Nvidia driver:
535.183.01
-
NVIDIA CUDA: 12.2
-
EFA Installer: 1.32.0
-
GDRCopy: 2.4
-
Nvidia container toolkit:
1.16.1
-
AWS OFI NCCL: 1.9.1
-
aws-neuronx-tools:
2.18.3.0-1
-
aws-neuronx-runtime-lib:
2.21.41.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.17.17.0
-
aws-neuronx-collectives:
2.21.46.0
- Deep Learning EKS AMI 1.30
-
-
Amazon EKS Components
-
Kubernetes Version: 1.30.2
-
Containerd Version: 1.7.20
-
Runc Version: 1.1.11
-
AWS IAM Authenticator: 0.6.21
-
Amazon SSM Agent:
3.3.380
-
Linux Kernel: 5.10.223
-
OSS Nvidia driver:
535.183.01
-
NVIDIA CUDA: 12.2
-
EFA Installer: 1.32.0
-
GDRCopy: 2.4
-
Nvidia container toolkit:
1.16.1
-
AWS OFI NCCL: 1.9.1
-
aws-neuronx-tools:
2.18.3.0-1
-
aws-neuronx-runtime-lib:
2.21.41.0
-
aws-neuronx-oci-hook:
2.4.4.0-1
-
aws-neuronx-dkms:
2.17.17.0
-
aws-neuronx-collectives:
2.21.46.0