SageMaker data parallelism library release notes
See the following release notes to track the latest updates for the SageMaker distributed data parallelism (SMDDP) library.
The SageMaker distributed data parallelism library v2.3.0
Date: June 11, 2024
New features
-
Added support for PyTorch v2.3.0 with CUDA v12.1 and Python v3.11.
-
Added support for PyTorch Lightning v2.2.5. This is integrated into the SageMaker framework container for PyTorch v2.3.0.
-
Added instance type validation during import to prevent loading the SMDDP library on unsupported instance types. For a list of instance types compatible with the SMDDP library, see Supported frameworks, AWS Regions, and instances types.
Integration into SageMaker Framework Containers
This version of the SMDDP library is migrated to the following SageMaker Framework Container
-
PyTorch v2.3.0
763104351884.dkr.ecr.
<region>
.amazonaws.com/pytorch-training:2.3.0-gpu-py311-cu121-ubuntu20.04-sagemaker
For a complete list of versions of the SMDDP library and the pre-built containers, see Supported frameworks, AWS Regions, and instances types.
Binary file of this release
You can download or install the library using the following URL.
https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.3.0/cu121/2024-05-23/smdistributed_dataparallel-2.3.0-cp311-cp311-linux_x86_64.whl
Other changes
-
The SMDDP library v2.2.0 is integrated into the SageMaker framework container for PyTorch v2.2.0.
The SageMaker distributed data parallelism library v2.2.0
Date: March 4, 2024
New features
-
Added support for PyTorch v2.2.0 with CUDA v12.1.
Integration into Docker containers distributed by the SageMaker model parallelism (SMP) library
This version of the SMDDP library is migrated to The SageMaker model parallelism library v2.2.0.
658645717510.dkr.ecr.
<region>
.amazonaws.com/smdistributed-modelparallel:2.2.0-gpu-py310-cu121
For Regions where the SMP Docker images are available, see AWS Regions.
Binary file of this release
You can download or install the library using the following URL.
https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.2.0/cu121/2024-03-04/smdistributed_dataparallel-2.2.0-cp310-cp310-linux_x86_64.whl
The SageMaker distributed data parallelism library v2.1.0
Date: March 1, 2024
New features
-
Added support for PyTorch v2.1.0 with CUDA v12.1.
Bug fixes
-
Fixed the CPU memory leak issue in SMDDP v2.0.1.
Integration into SageMaker Framework Containers
This version of the SMDDP library passed benchmark testing and is migrated to the
following SageMaker Framework Container
-
PyTorch v2.1.0
763104351884.dkr.ecr.
<region>
.amazonaws.com/pytorch-training:2.1.0-gpu-py310-cu121-ubuntu20.04-sagemaker
Integration into Docker containers distributed by the SageMaker model parallelism (SMP) library
This version of the SMDDP library is migrated to The SageMaker model parallelism library v2.1.0.
658645717510.dkr.ecr.
<region>
.amazonaws.com/smdistributed-modelparallel:2.1.2-gpu-py310-cu121
For Regions where the SMP Docker images are available, see AWS Regions.
Binary file of this release
You can download or install the library using the following URL.
https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.1.0/cu121/2024-02-04/smdistributed_dataparallel-2.1.0-cp310-cp310-linux_x86_64.whl
The SageMaker distributed data parallelism library v2.0.1
Date: December 7, 2023
New features
-
Added a new SMDDP-implementation of
AllGather
collective operation optimized for AWS compute resources and network infrastructure. To learn more, see SMDDP AllGather collective operation. -
The SMDDP
AllGather
collective operation is compatible with PyTorch FSDP and DeepSpeed. To learn more, see Use the SMDDP library in your PyTorch training script. -
Added support for PyTorch v2.0.1
Known issues
-
There's a CPU memory leak issue from a gradual CPU memory increase while training with SMDDP
AllReduce
in DDP mode.
Integration into SageMaker Framework Containers
This version of the SMDDP library passed benchmark testing and is migrated to the
following SageMaker Framework Container
-
PyTorch v2.0.1
763104351884.dkr.ecr.
<region>
.amazonaws.com/pytorch-training:2.0.1-gpu-py310-cu118-ubuntu20.04-sagemaker
Binary file of this release
You can download or install the library using the following URL.
https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.0.1/cu118/2023-12-07/smdistributed_dataparallel-2.0.2-cp310-cp310-linux_x86_64.whl
Other changes
-
Starting from this release, documentation for the SMDDP library is fully available in this Amazon SageMaker Developer Guide. In favor of the complete developer guide for SMDDP v2 housed in the Amazon SageMaker Developer Guide, documentation for the additional reference for SMDDP v1.x
in the SageMaker Python SDK documentation is no longer supported. If you still need SMP v1.x documentation, see the following snapshot of the documentation at SageMaker Python SDK v2.212.0 documentation .