PoorWeightInitialization Rule - Amazon SageMaker

PoorWeightInitialization Rule

This rule detects if your model parameters have been poorly initialized.

Good initialization breaks the symmetry of the weights and gradients in a neural network and maintains commensurate activation variances across layers. Otherwise, the neural network doesn't learn effectively. Initializers like Xavier aim to keep variance constant across activations, which is especially relevant for training very deep neural nets. Too small an initialization can lead to vanishing gradients. Too large an initialization can lead to exploding gradients. This rule checks the variance of activation inputs across layers, the distribution of gradients, and the loss convergence for the initial steps to determine if a neural network has been poorly initialized.

Parameter Descriptions for the PoorWeightInitialization Rule
Parameter Name Description

The trial run using this rule. The rule inspects the tensors gathered from this trial.


Valid values: String


A list of regex patterns that is used to restrict this comparison to specific scalar-valued tensors. The rule inspects only the tensors that match the regex patterns specified in the list. If no patterns are passed, the rule compares all tensors gathered in the trials by default. Only scalar-valued tensors can be matched.


Valid values: List of strings or a comma-separated string

Default value: ".*relu_input"


If the ratio between minimum and maximum variance of weights per layer exceeds the threshold at a step, the rule returns True.


Valid values: Float

Default value: 10.0


If the minimum difference between 5th and 95th percentiles of the gradient distribution is less than the distribution_range, the rule returns True.


Valid values: Float

Default value: 0.001


The number of steps to wait until the loss is considered to be no longer decreasing.


Valid values: Integer

Default value: 5


The number of steps this rule analyzes. You typically want to check only the first few iterations.


Valid values: Float

Default value: 10

For an example of how to configure and deploy a built-in rule, see How to Use Built-in Rules for Model Analysis.


This rule can't be applied to the XGBoost algorithm.