Define a scaling policy - Amazon SageMaker

Define a scaling policy

Before you add a scaling policy to your model, save your policy configuration as a JSON block in a text file. You use that text file when invoking the AWS Command Line Interface (AWS CLI) or the Application Auto Scaling API. You can optimize scaling by choosing an appropriate CloudWatch metric. However, before using a custom metric in production, you must test auto scaling with your custom metric.

This section shows you example policy configurations for target tracking scaling policies.

Specify a predefined metric (CloudWatch metric: InvocationsPerInstance)

The following is an example target tracking policy configuration for a variant that keeps the average invocations per instance at 70. Save this configuration in a file named config.json.

{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance" } }

For more information, see TargetTrackingScalingPolicyConfiguration in the Application Auto Scaling API Reference.

Define a custom metric (CloudWatch metric: CPUUtilization)

To create a target tracking scaling policy with a custom metric, specify the metric's name, namespace, unit, statistic, and zero or more dimensions. A dimension consists of a dimension name and a dimension value. You can use any production variant metric that changes in proportion to capacity.

The following example configuration shows a target tracking scaling policy with a custom metric. The policy scales the variant based on an average CPU utilization of 50 percent across all instances. Save this configuration in a file named config.json.

{ "TargetValue": 50.0, "CustomizedMetricSpecification": { "MetricName": "CPUUtilization", "Namespace": "/aws/sagemaker/Endpoints", "Dimensions": [ {"Name": "EndpointName", "Value": "my-endpoint" }, {"Name": "VariantName","Value": "my-variant"} ], "Statistic": "Average", "Unit": "Percent" } }

For more information, see CustomizedMetricSpecification in the Application Auto Scaling API Reference.

Define a custom metric (CloudWatch metric: ExplanationsPerInstance)

When the endpoint has online explainability activated, it emits a ExplanationsPerInstance metric that outputs the average number of records explained per minute, per instance, for a variant. The resource utilization of explaining records can be more different than that of predicting records. We strongly recommend using this metric for target tracking scaling of endpoints with online explainability activated.

You can create multiple target tracking policies for a scalable target. Consider adding the InvocationsPerInstance policy from the Specify a predefined metric (CloudWatch metric: InvocationsPerInstance) section (in addition to the ExplanationsPerInstance policy). If most invocations don't return an explanation because of the threshold value set in the EnableExplanations parameter, then the endpoint can choose the InvocationsPerInstance policy. If there is a large number of explanations, the endpoint can use the ExplanationsPerInstance policy.

The following example configuration shows a target tracking scaling policy with a custom metric. The policy scale adjusts the number of variant instances so that each instance has an ExplanationsPerInstance metric of 20. Save this configuration in a file named config.json.

{ "TargetValue": 20.0, "CustomizedMetricSpecification": { "MetricName": "ExplanationsPerInstance", "Namespace": "AWS/SageMaker", "Dimensions": [ {"Name": "EndpointName", "Value": "my-endpoint" }, {"Name": "VariantName","Value": "my-variant"} ], "Statistic": "Sum" } }

For more information, see CustomizedMetricSpecification in the Application Auto Scaling API Reference.

Specify cooldown periods

You can optionally define cooldown periods in your target tracking scaling policy by specifying the ScaleOutCooldown and ScaleInCooldown parameters.

The following is an example target tracking policy configuration for a variant that keeps the average invocations per instance at 70. The policy configuration provides a scale-in cooldown period of 10 minutes (600 seconds) and a scale-out cooldown period of 5 minutes (300 seconds). Save this configuration in a file named config.json.

{ "TargetValue": 70.0, "PredefinedMetricSpecification": { "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance" }, "ScaleInCooldown": 600, "ScaleOutCooldown": 300 }

For more information, see TargetTrackingScalingPolicyConfiguration in the Application Auto Scaling API Reference.