Create Debugger Custom Rules for Training Job Analysis - Amazon SageMaker

Create Debugger Custom Rules for Training Job Analysis

You can create custom rules to monitor your training job using the Debugger Rule APIs and the open source smdebug Python library that provide tools to build your own rule containers.

Prerequisites for Creating Debugger Custom Rules

To create Debugger custom rules, you need the following prerequisites.

Use the Debugger Client Library smdebug to Create a Custom Rule Python Script

The smdebug Rule API provides an interface to set up your own custom rules. The following python script is a sample of how to construct a custom rule, CustomGradientRule. This tutorial custom rule watches if the gradients are getting too large and set the default threshold as 10. The custom rule takes a base trial created by a SageMaker estimator when it initiates training job.

from smdebug.rules.rule import Rule class CustomGradientRule(Rule): def __init__(self, base_trial, threshold=10.0): super().__init__(base_trial) self.threshold = float(threshold) def invoke_at_step(self, step): for tname in self.base_trial.tensor_names(collection="gradients"): t = self.base_trial.tensor(tname) abs_mean = t.reduction_value(step, "mean", abs=True) if abs_mean > self.threshold: return True return False

You can add multiple custom rule classes as many as you want in the same python script and deploy them to any training job trials by constructing custom rule objects in the following section.

Use the Debugger APIs to Run Your Own Custom Rules

The following code sample shows how to configure a custom rule with the Amazon SageMaker Python SDK. This example assumes that the custom rule script you created in the previous step is located at 'path/to/'.

from sagemaker.debugger import Rule, CollectionConfig custom_rule = Rule.custom( name='MyCustomRule', image_uri='', instance_type='ml.t3.medium', source='path/to/', rule_to_invoke='CustomGradientRule', collections_to_save=[CollectionConfig("gradients")], rule_parameters={"threshold": "20.0"} )

The following list explains the Debugger Rule.custom API arguments.

  • name (str): Specify a custom rule name as you want.

  • image_uri (str): This is the image of the container that has the logic of understanding your custom rule. It sources and evaluates the specified tensor collections you save in the training job. You can find the list of open source SageMaker rule evaluator images from Amazon SageMaker Debugger Registry URLs for Custom Rule Evaluators.

  • instance_type (str): You need to specify an instance to build a rule docker container. This spins up the instance in parallel with a training container.

  • source (str): This is the local path or the Amazon S3 URI to your custom rule script.

  • rule_to_invoke (str): This specifies the particular Rule class implementation in your custom rule script. SageMaker supports only one rule to be evaluated at a time in a rule job.

  • collections_to_save (str): This specifies which tensor collections you will save for the rule to run.

  • rule_parameters (dictionary): This accepts parameter inputs in a dictionary format. You can adjust the parameters that you configured in the custom rule script.

After you set up the custom_rule object, you can use it for building a SageMaker estimator for any training jobs. Specify the entry_point to your training script. You do not need to make any change of your training script.

from sagemaker.tensorflow import TensorFlow estimator = TensorFlow( role=sagemaker.get_execution_role(), base_job_name='smdebug-custom-rule-demo-tf-keras', entry_point='path/to/' train_instance_type='ml.p2.xlarge' ... # debugger-specific arguments below rules = [custom_rule] )

For more variations and advanced examples of using Debugger custom rules, see the following example notebooks.