本文為英文版的機器翻譯版本,如內容有任何歧義或不一致之處,概以英文版為準。
步驟 2:使用 SageMaker Python 啟動和偵錯訓練工作 SDK
若要使用偵錯器設定 SageMaker 估算 SageMaker 器,請使用 Amazon SageMaker Python SDK 並指定除錯器特定的參數。若要充分利用除錯功能,您需要設定三個參數:debugger_hook_config
、tensorboard_output_config
和 rules
。
使用調試器特定參數構造 SageMaker 估算器
本節中的程式碼範例說明如何使用除錯器特定參數建構 SageMaker 估算器。
下列程式碼範例是用來建構 SageMaker 架構估算器的範本,而非直接可執行檔。您必須繼續後續幾節,設定特定 Debugger 參數。
- PyTorch
-
# An example of constructing a SageMaker PyTorch estimator
import boto3
import sagemaker
from sagemaker.pytorch import PyTorch
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
session=boto3.session.Session()
region=session.region_name
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=PyTorch(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="1.12.0
",
py_version="py37
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- TensorFlow
-
# An example of constructing a SageMaker TensorFlow estimator
import boto3
import sagemaker
from sagemaker.tensorflow import TensorFlow
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
session=boto3.session.Session()
region=session.region_name
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
,
ProfilerRule.sagemaker(rule_configs.BuiltInRule())
]
estimator=TensorFlow(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="2.9.0
",
py_version="py39
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- MXNet
-
# An example of constructing a SageMaker MXNet estimator
import sagemaker
from sagemaker.mxnet import MXNet
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=MXNet(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="1.7.0
",
py_version="py37
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- XGBoost
-
# An example of constructing a SageMaker XGBoost estimator
import sagemaker
from sagemaker.xgboost.estimator import XGBoost
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
estimator=XGBoost(
entry_point="directory/to/your_training_script.py
",
role=sagemaker.get_execution_role(),
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.p3.2xlarge
",
framework_version="1.5-1
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
- Generic estimator
-
# An example of constructing a SageMaker generic estimator using the XGBoost algorithm base image
import boto3
import sagemaker
from sagemaker.estimator import Estimator
from sagemaker import image_uris
from sagemaker.debugger import CollectionConfig, DebuggerHookConfig, Rule, rule_configs
debugger_hook_config
=DebuggerHookConfig(...)
rules
=[
Rule.sagemaker(rule_configs.built_in_rule())
]
region=boto3.Session().region_name
xgboost_container=sagemaker.image_uris.retrieve("xgboost", region, "1.5-1")
estimator=Estimator(
role=sagemaker.get_execution_role()
image_uri=xgboost_container,
base_job_name="debugger-demo
",
instance_count=1
,
instance_type="ml.m5.2xlarge
",
# Debugger-specific parameters
debugger_hook_config=debugger_hook_config
,
rules=rules
)
estimator.fit(wait=False)
設定下列參數以啟動 SageMaker 除錯程式:
SageMaker 偵錯工具將輸出張量安全地儲存在 S3 儲存貯體的子資料夾中。例如,您帳戶URI中預設 S3 儲存貯體的格式為s3://sagemaker-<region>-<12digit_account_id>/<base-job-name>/<debugger-subfolders>/
。有兩個由 SageMaker 調試器創建的子文件夾:debug-output
,和rule-output
。如果新增 tensorboard_output_config
參數,您也會找到 tensorboard-output
資料夾。
請參閱下列主題,尋找關於如何設定特定 Debugger 參數的更多詳細範例。