瞭解如何使用 SageMaker Python SDK 啟動 SageMaker 訓練編譯器。瞭解如何使用 SageMaker CreateTrainingJob API 作業啟動 SageMaker 訓練編譯器。

使用 PyTorch 訓練編譯器執行 SageMaker 訓練工作

您可以使用任何 SageMaker 介面透過訓練編譯器執行 SageMaker 訓練任務：Amazon SageMaker Studio 經典版、Amazon SageMaker 筆記本執行個體和 AWS Command Line Interface. AWS SDK for Python (Boto3)

主題

使用開 SageMaker Python 套件
使用 SageMaker CreateTrainingJob API 操作

使用開 SageMaker Python 套件

SageMaker 的訓練編譯 PyTorch 器可透過 SageMaker PyTorch和HuggingFace架構估算器類別取得。若要開啟 SageMaker 訓練編譯器，請將compiler_config參數新增至 SageMaker 估算器。匯入 TrainingCompilerConfig 類別並將其執行個體傳遞至 compiler_config 參數。下列程式碼範例顯示開啟「 SageMaker 訓練編譯器」時， SageMaker 估算器類別的結構。

提示

要開始使用 PyTorch 或變形金剛提供的預構建模型，請嘗試使用參考表中提供的批次大小。測試模型模型

注意

原生 PyTorch 支援可在 SageMaker Python 開發套件 v2.121.0 及更新版本中取得。請確保您相應地更新了 SageMaker Python 開發套件。

注意

從 PyTorch v1.12.0 開始，可以使用的 SageMaker 訓練編譯器容器 PyTorch 。請注意，的 SageMaker 訓練編譯器容器不 PyTorch 會與 Hugging Face 變壓器一起預先包裝。如需在容器中安裝程式庫，請務必在提交訓練任務時將 requirements.txt 檔案新增至來源目錄下。

對於 PyTorch v1.11.0 和之前的版本，請使用先前版本的「 SageMaker 訓練編譯器」容器來 Hugging Face 部和. PyTorch

如需架構版本和相應容器資訊的完整清單，請參閱支援的架構。

如需符合您使用案例的資訊，請參閱下列其中一個選項。

PyTorch v1.12.0 and later

若要編譯和訓練 PyTorch 模型，請使用 SageMaker 訓練編譯器設 SageMaker PyTorch定估算器，如下列程式碼範例所示。

注意

此原生 PyTorch 支援可在 SageMaker Python SDK v2.120.0 及更新版本中使用。請確定您已更新 SageMaker Python 開發套件。


from sagemaker.pytorch import PyTorch, TrainingCompilerConfig

# the original max batch size that can fit into GPU memory without compiler
batch_size_native=12
learning_rate_native=float('5e-5')

# an updated max batch size that can fit into GPU memory with compiler
batch_size=64

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size

hyperparameters={
    "n_gpus": 1,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_estimator=PyTorch(
    entry_point='train.py',
    source_dir='path-to-requirements-file', # Optional. Add this if need to install additional packages.
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    framework_version='1.13.1',
    py_version='py3',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_estimator.fit()

Hugging Face Transformers with PyTorch v1.11.0 and before

若要使用編譯和訓練變壓器模型 PyTorch，請使用 SageMaker 訓練編譯器設定 SageMaker Hugging Face 估算器，如下列程式碼範例所示。


from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig

# the original max batch size that can fit into GPU memory without compiler
batch_size_native=12
learning_rate_native=float('5e-5')

# an updated max batch size that can fit into GPU memory with compiler
batch_size=64

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size

hyperparameters={
    "n_gpus": 1,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_huggingface_estimator=HuggingFace(
    entry_point='train.py',
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    transformers_version='4.21.1',
    pytorch_version='1.11.0',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_huggingface_estimator.fit()

若要準備訓練指令碼，請參閱以下頁面。

若要尋找 end-to-end 範例，請參閱下列筆記本：

PyTorch v1.12

對於 PyTorch v1.12，您可以通過添加指定給 SageMaker PyTorch估計器類的distribution參數的pytorch_xla選項運行分佈式 SageMaker 訓練與培訓編譯器。

注意

這個原生 PyTorch 支援可在 SageMaker Python SDK v2.121.0 及更新版本中使用。請確定您已更新 SageMaker Python 開發套件。


from sagemaker.pytorch import PyTorch, TrainingCompilerConfig

# choose an instance type, specify the number of instances you want to use,
# and set the num_gpus variable the number of GPUs per instance.
instance_count=1
instance_type='ml.p3.8xlarge'
num_gpus=4

# the original max batch size that can fit to GPU memory without compiler
batch_size_native=16
learning_rate_native=float('5e-5')

# an updated max batch size that can fit to GPU memory with compiler
batch_size=26

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size*num_gpus*instance_count

hyperparameters={
    "n_gpus": num_gpus,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_estimator=PyTorch(
    entry_point='your_training_script.py',
    source_dir='path-to-requirements-file', # Optional. Add this if need to install additional packages.
    instance_count=instance_count,
    instance_type=instance_type,
    framework_version='1.13.1',
    py_version='py3',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    distribution ={'pytorchxla' : { 'enabled': True }},
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_estimator.fit()

提示

若要準備訓練指令碼，請參閱 PyTorch

Transformers v4.21 with PyTorch v1.11

對於 PyTorch v1.11 及更新版本， SageMaker 訓練編譯器可用於具有指定給參數的pytorch_xla選項的分散式訓練。distribution


from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig

# choose an instance type, specify the number of instances you want to use,
# and set the num_gpus variable the number of GPUs per instance.
instance_count=1
instance_type='ml.p3.8xlarge'
num_gpus=4

# the original max batch size that can fit to GPU memory without compiler
batch_size_native=16
learning_rate_native=float('5e-5')

# an updated max batch size that can fit to GPU memory with compiler
batch_size=26

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size*num_gpus*instance_count

hyperparameters={
    "n_gpus": num_gpus,
    "batch_size": batch_size,
    "learning_rate": learning_rate
}

pytorch_huggingface_estimator=HuggingFace(
    entry_point='your_training_script.py',
    instance_count=instance_count,
    instance_type=instance_type,
    transformers_version='4.21.1',
    pytorch_version='1.11.0',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    distribution ={'pytorchxla' : { 'enabled': True }},
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_huggingface_estimator.fit()

提示

若要準備訓練指令碼，請參閱以下頁面。

Transformers v4.17 with PyTorch v1.10.2 and before

對於 PyTorch v1.10.2 和之前版本的支援版本， SageMaker 訓練編譯器需要另一種機制來啟動分散式訓練工作。若要執行分散式訓 SageMaker 練，訓練編譯器會要求您將 SageMaker 分散式訓練啟動器指令碼傳遞至entry_point引數，並將訓練指令碼傳遞給hyperparameters引數。下列程式碼範例顯示如何設定套用所需變更的 SageMaker Hugging Face 估計器。


from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig

# choose an instance type, specify the number of instances you want to use,
# and set the num_gpus variable the number of GPUs per instance.
instance_count=1
instance_type='ml.p3.8xlarge'
num_gpus=4

# the original max batch size that can fit to GPU memory without compiler
batch_size_native=16
learning_rate_native=float('5e-5')

# an updated max batch size that can fit to GPU memory with compiler
batch_size=26

# update learning rate
learning_rate=learning_rate_native/batch_size_native*batch_size*num_gpus*instance_count

training_script="your_training_script.py"

hyperparameters={
    "n_gpus": num_gpus,
    "batch_size": batch_size,
    "learning_rate": learning_rate,
    "training_script": training_script     # Specify the file name of your training script.
}

pytorch_huggingface_estimator=HuggingFace(
    entry_point='distributed_training_launcher.py',    # Specify the distributed training launcher script.
    instance_count=instance_count,
    instance_type=instance_type,
    transformers_version='4.17.0',
    pytorch_version='1.10.2',
    hyperparameters=hyperparameters,
    compiler_config=TrainingCompilerConfig(),
    disable_profiler=True,
    debugger_hook_config=False
)

pytorch_huggingface_estimator.fit()

啟動器指令碼看起來應該如下所示。它會封裝您的訓練指令碼，並根據您所選的訓練執行個體大小來設定分散式訓練環境。


# distributed_training_launcher.py

#!/bin/python

import subprocess
import sys

if __name__ == "__main__":
    arguments_command = " ".join([arg for arg in sys.argv[1:]])
    """
    The following line takes care of setting up an inter-node communication
    as well as managing intra-node workers for each GPU.
    """
    subprocess.check_call("python -m torch_xla.distributed.sm_dist " + arguments_command, shell=True)

提示

如需準備訓練指令碼的說明，請參閱下列頁面。

提示

若要尋找 end-to-end 範例，請參閱下列筆記本：

下列清單是使用編譯器執行 SageMaker 訓練工作所需的最小參數集。

注意

使用 SageMaker Hugging Face 估算器時，您必須指定、、和compiler_config參數以啟 SageMaker 用「transformers_version訓練編譯器」。pytorch_version hyperparameters您無法使用 image_uri 手動指定列於支援的架構的 Training Compiler 整合式深度學習容器。

entry_point (str) — 必要條件。指定訓練指令碼的檔案名稱。
注意
若要使用 SageMaker 訓練編譯器和 PyTorch v1.10.2 及之前執行分散式訓練，請為此參數指定啟動器指令碼的檔案名稱。您應準備好啟動器指令碼，以包裝您的訓練指令碼並配置分散式訓練環境。如需詳細資訊，請參閱下列範例筆記本：
- 編譯和訓練 GPT2 模型，使用轉換器訓練器 API 搭配 SST2 資料集進行單一節點多重 GPU 訓練
- 編譯和訓練 GPT2 模型，使用轉換器訓練器 API 搭配 SST2 資料集進行多節點多重 GPU 訓練
source_dir (str) — 選用。如需安裝其他套件，請新增此項目。如要安裝套件，您需要在此目錄下備妥一個 requirements.txt 檔案。
instance_count (int) — 必要條件。指定執行個體數目。
instance_type (str) — 必要條件。指定執行個體類型。
transformers_version(str) — 僅在使用 SageMaker Hugging Face 估算器時才需要。指定 SageMaker 訓練編譯器支援的 Hugging Face 變壓器程式庫版本。若要尋找可用版本，請參閱支援的架構。
framework_version 或 pytorch_version (str) — 必要條件。指定 SageMaker 訓練編譯器支援的 PyTorch 版本。若要尋找可用版本，請參閱支援的架構。

注意
使用 SageMaker Hugging Face 估計器時，必須同時指定和。transformers_version pytorch_version
hyperparameters (dict) — 選用。指定訓練任務的超參數，例如 n_gpus、batch_size 和 learning_rate。當您啟用 SageMaker 訓練編譯器時，請嘗試更大的批次大小，並相應地調整學習速率。若要尋找使用編譯器和調整批次大小以改善訓練速度的案例研究，請參閱測試模型模型和 SageMaker 訓練編譯器範例筆記本和部落。

注意
若要使用 SageMaker 訓練編譯器和 PyTorch v1.10.2 及之前執行分散式訓練，您需要新增額外的參數"training_script"，以指定訓練指令碼，如前面的程式碼範例所示。
compiler_config(TrainingCompilerConfig 物件) — 啟動 SageMaker 訓練編譯器所需。包含此參數以開啟 SageMaker 訓練編譯器。下列是 TrainingCompilerConfig 類型的參數。
- enabled (bool) – 選用。指定True或False開啟或關閉 SageMaker 訓練編譯器。預設值為 True。
- debug (bool) – 選用。若要從編譯器加速型訓練任務接收更詳細的訓練日誌，請將其變更為 True。不過，額外的記錄可能會增加額外負荷，並降低已編譯的訓練任務。預設值為 False。
distribution (dict) — 選用。若要使用訓練編譯器執行分散式 SageMaker 訓練工作，請新增distribution = { 'pytorchxla' : { 'enabled': True }}。

警告

如果您開啟 SageMaker 偵錯工具，可能會影響 SageMaker 訓練編譯器的效能。我們建議您在執行 SageMaker 訓練編譯器時關閉偵錯工具，以確定沒有對效能造成任何影響。如需詳細資訊，請參閱考量事項。若要關閉偵錯工具功能，請將下列兩個引數新增至估算器：


disable_profiler=True,
debugger_hook_config=False

如果成功啟動使用編譯器的訓練任務，您會在任務初始化階段接收到下列日誌：

搭配 TrainingCompilerConfig(debug=False)


Found configuration for Training Compiler
Configuring SM Training Compiler...

搭配 TrainingCompilerConfig(debug=True)


Found configuration for Training Compiler
Configuring SM Training Compiler...
Training Compiler set to debug mode

使用 SageMaker `CreateTrainingJob` API 操作

SageMaker 訓練編譯器組態選項必須透過 CreateTrainingJobAPI 作業的要求語法中的AlgorithmSpecification和HyperParameters欄位來指定。


"AlgorithmSpecification": {
    "TrainingImage": "<sagemaker-training-compiler-enabled-dlc-image>"
},

"HyperParameters": {
    "sagemaker_training_compiler_enabled": "true",
    "sagemaker_training_compiler_debug_mode": "false",
    "sagemaker_pytorch_xla_multi_worker_enabled": "false"    // set to "true" for distributed training
}

若要尋找已實作 SageMaker 訓練編譯器的深度學習容器映像 URI 的完整清單，請參閱支援的架構。

您的瀏覽器已停用或無法使用 Javascript。

您必須啟用 Javascript，才能使用 AWS 文件。請參閱您的瀏覽器說明頁以取得說明。

文件慣用形式

啟用 Training Compiler

使用 TensorFlow 訓練編譯器執行訓練工作

使用 PyTorch 訓練編譯器執行 SageMaker 訓練工作

主題

使用開 SageMaker Python 套件

提示

注意

注意

注意

注意

提示

提示

提示

提示

注意

注意

注意

注意

警告

使用 SageMaker CreateTrainingJob API 操作

使用 SageMaker `CreateTrainingJob` API 操作