Actions on rules using Amazon CloudWatch and AWS Lambda
Amazon CloudWatch collects Amazon SageMaker AI model training job logs and Amazon SageMaker Debugger rule processing job logs. Configure Debugger with Amazon CloudWatch Events and AWS Lambda to take action based on Debugger rule evaluation status.
Example notebooks
You can run the following example notebooks, which are prepared for experimenting with stopping a training job using actions on Debugger's built-in rules using Amazon CloudWatch and AWS Lambda.
-
Amazon SageMaker Debugger - Reacting to CloudWatch Events from Rules
This example notebook runs a training job that has a vanishing gradient issue. The Debugger VanishingGradient built-in rule is used while constructing the SageMaker AI TensorFlow estimator. When the Debugger rule detects the issue, the training job is terminated.
-
Detect Stalled Training and Invoke Actions Using SageMaker Debugger Rule
This example notebook runs a training script with a code line that forces it to sleep for 10 minutes. The Debugger StalledTrainingRule built-in rule invokes issues and stops the training job.