Monitor AWS compute resource utilization in Amazon SageMaker Studio Classic
To track compute resource utilization of your training job, use the monitoring tools offered by Amazon SageMaker Debugger.
For any training job you run in SageMaker using the SageMaker Python SDK, Debugger collects basic resource utilization metrics, such as CPU utilization, GPU utilization, GPU memory utilization, network, and I/O wait time every 500 milliseconds. To see the dashbard of the resource utilization metrics of your training job, simply use the SageMaker Debugger UI in SageMaker Studio Experiments.
Deep learning operations and steps might operate in intervals of milliseconds. Compared to Amazon CloudWatch metrics, which collect metrics at intervals of 1 second, Debugger provides finer granularity into the resource utilization metrics down to 100-millisecond (0.1 second) intervals so you can dive deep into the metrics at the level of an operation or a step.
If you want to change the metric collection time interval, you can add a paramter for
profiling configuration to your training job launcher. For example, if you're using the SageMaker
Python SDK, you need to pass the profiler_config
parameter when you create an
estimator object. To learn how to adjust the resource utilization metric collection
interval, see Code template for
configuring a SageMaker estimator object with the SageMaker Debugger Python modules in the SageMaker
Python SDK and then Configure settings for basic
profiling of system resource utilization.
Additionally, you can add issue detecting tools called built-in
profiling rules provided by SageMaker Debugger. The built-in profiling rules run
analysis against the resource utilization metrics and detect computational performance
issues. For more information, see Configure built-in profiler rules
managed by Amazon SageMaker Debugger.
You can receive rule analysis results through the SageMaker Debugger UI in SageMaker Studio
Experiments or the SageMaker Debugger Profiling Report
To learn more about monitoring functionalities provided by SageMaker Debugger, see the following topics.
Topics
- Configure an estimator with parameters for basic profiling using the Amazon SageMaker Debugger Python modules
- Configure built-in profiler rules managed by Amazon SageMaker Debugger
- List of Debugger built-in profiler rules
- Amazon SageMaker Debugger UI in Amazon SageMaker Studio Classic Experiments
- SageMaker Debugger interactive report
- Analyze data using the Debugger Python client library