Core features of the SageMaker AI model parallelism library v2 - Amazon SageMaker AI

Core features of the SageMaker AI model parallelism library v2

The Amazon SageMaker AI model parallelism library v2 (SMP v2) offers distribution strategies and memory-saving techniques, such as sharded data parallelism, tensor parallelism, and checkpointing. The model parallelism strategies and techniques offered by SMP v2 help distribute large models across multiple devices while optimizing training speed and memory consumption. SMP v2 also provides a Python package torch.sagemaker to help adapt your training script with few lines of code change.

This guide follows the basic two-step flow introduced in Use the SageMaker AI model parallelism library v2. To dive deep into the core features of SMP v2 and how to use them, see the following topics.

Note

These core features are available in SMP v2.0.0 and later and the SageMaker Python SDK v2.200.0 and later, and works for PyTorch v2.0.1 and later. To check the versions of the packages, see Supported frameworks and AWS Regions.