You can use any of the SageMaker AI interfaces to run a training job with SageMaker Training Compiler: Amazon SageMaker Studio Classic, Amazon SageMaker notebook instances, AWS SDK for Python (Boto3), and AWS Command Line Interface.
Using the SageMaker Python SDK
SageMaker Training Compiler for PyTorch is available through the SageMaker AI PyTorch
HuggingFace
compiler_config
parameter to the SageMaker AI estimators. Import
the TrainingCompilerConfig
class and pass an instance of it to the
compiler_config
parameter. The following code examples show the
structure of SageMaker AI estimator classes with SageMaker Training Compiler turned on.
Tip
To get started with prebuilt models provided by PyTorch or Transformers, try using the batch sizes provided in the reference table at Tested Models.
Note
The native PyTorch support is available in the SageMaker Python SDK v2.121.0 and later. Make sure that you update the SageMaker Python SDK accordingly.
Note
Starting PyTorch v1.12.0, SageMaker Training Compiler containers for PyTorch are available. Note that
the SageMaker Training Compiler containers for PyTorch are not prepackaged with Hugging Face
Transformers. If you need to install the library in the container, make sure that
you add the requirements.txt
file under the source directory when
submitting a training job.
For PyTorch v1.11.0 and before, use the previous versions of the SageMaker Training Compiler containers for Hugging Face and PyTorch.
For a complete list of framework versions and corresponding container information, see Supported Frameworks.
For information that fits your use case, see one of the following options.
The following list is the minimal set of parameters required to run a SageMaker training job with the compiler.
Note
When using the SageMaker AI Hugging Face estimator, you must specify the
transformers_version
, pytorch_version
,
hyperparameters
, and compiler_config
parameters to
enable SageMaker Training Compiler. You cannot use image_uri
to manually specify the
Training Compiler integrated Deep Learning Containers that are listed at Supported
Frameworks.
-
entry_point
(str) – Required. Specify the file name of your training script.Note
To run a distributed training with SageMaker Training Compiler and PyTorch v1.10.2 and before, specify the file name of a launcher script to this parameter. The launcher script should be prepared to wrap your training script and configure the distributed training environment. For more information, see the following example notebooks:
-
source_dir
(str) – Optional. Add this if need to install additional packages. To install packages, you need to prapare arequirements.txt
file under this directory. -
instance_count
(int) – Required. Specify the number of instances. -
instance_type
(str) – Required. Specify the instance type. -
transformers_version
(str) – Required only when using the SageMaker AI Hugging Face estimator. Specify the Hugging Face Transformers library version supported by SageMaker Training Compiler. To find available versions, see Supported Frameworks. -
framework_version
orpytorch_version
(str) – Required. Specify the PyTorch version supported by SageMaker Training Compiler. To find available versions, see Supported Frameworks.Note
When using the SageMaker AI Hugging Face estimator, you must specify both
transformers_version
andpytorch_version
. -
hyperparameters
(dict) – Optional. Specify hyperparameters for the training job, such asn_gpus
,batch_size
, andlearning_rate
. When you enable SageMaker Training Compiler, try larger batch sizes and adjust the learning rate accordingly. To find case studies of using the compiler and adjusted batch sizes to improve training speed, see Tested Models and SageMaker Training Compiler Example Notebooks and Blogs.Note
To run a distributed training with SageMaker Training Compiler and PyTorch v1.10.2 and before, you need to add an additional parameter,
"training_script"
, to specify your training script, as shown in the preceding code example. -
compiler_config
(TrainingCompilerConfig object) – Required to activate SageMaker Training Compiler. Include this parameter to turn on SageMaker Training Compiler. The following are parameters for theTrainingCompilerConfig
class.-
enabled
(bool) – Optional. SpecifyTrue
orFalse
to turn on or turn off SageMaker Training Compiler. The default value isTrue
. -
debug
(bool) – Optional. To receive more detailed training logs from your compiler-accelerated training jobs, change it toTrue
. However, the additional logging might add overhead and slow down the compiled training job. The default value isFalse
.
-
-
distribution
(dict) – Optional. To run a distributed training job with SageMaker Training Compiler, adddistribution = { 'pytorchxla' : { 'enabled': True }}
.
Warning
If you turn on SageMaker Debugger, it might impact the performance of SageMaker Training Compiler. We recommend that you turn off Debugger when running SageMaker Training Compiler to make sure there's no impact on performance. For more information, see Considerations. To turn the Debugger functionalities off, add the following two arguments to the estimator:
disable_profiler=True,
debugger_hook_config=False
If the training job with the compiler is launched successfully, you receive the following logs during the job initialization phase:
-
With
TrainingCompilerConfig(debug=False)
Found configuration for Training Compiler Configuring SM Training Compiler...
-
With
TrainingCompilerConfig(debug=True)
Found configuration for Training Compiler Configuring SM Training Compiler... Training Compiler set to debug mode
Using the SageMaker AI
CreateTrainingJob
API Operation
SageMaker Training Compiler configuration options must be specified through the
AlgorithmSpecification
and HyperParameters
field in the
request syntax for the CreateTrainingJob
API operation.
"AlgorithmSpecification": {
"TrainingImage": "<sagemaker-training-compiler-enabled-dlc-image>
"
},
"HyperParameters": {
"sagemaker_training_compiler_enabled": "true",
"sagemaker_training_compiler_debug_mode": "false",
"sagemaker_pytorch_xla_multi_worker_enabled": "false" // set to "true" for distributed training
}
To find a complete list of deep learning container image URIs that have SageMaker Training Compiler implemented, see Supported Frameworks.