Define a scaling policy
Before you add a scaling policy to your model, save your policy configuration as a JSON block in a text file. You use that text file when invoking the AWS Command Line Interface (AWS CLI) or the Application Auto Scaling API. You can optimize scaling by choosing an appropriate CloudWatch metric. However, before using a custom metric in production, you must test auto scaling with your custom metric.
This section shows you example policy configurations for target tracking scaling policies.
Topics
Specify a predefined metric (CloudWatch metric: InvocationsPerInstance)
The following is an example target tracking policy configuration for a variant
that keeps the average invocations per instance at 70. Save this configuration
in a file named config.json
.
{ "TargetValue":
70.0
, "PredefinedMetricSpecification": { "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance" } }
For more information, see TargetTrackingScalingPolicyConfiguration in the Application Auto Scaling API Reference.
Define a custom metric (CloudWatch metric: CPUUtilization)
To create a target tracking scaling policy with a custom metric, specify the metric's name, namespace, unit, statistic, and zero or more dimensions. A dimension consists of a dimension name and a dimension value. You can use any production variant metric that changes in proportion to capacity.
The following example configuration shows a target tracking scaling policy
with a custom metric. The policy scales the variant based on an average CPU
utilization of 50 percent across all instances. Save this configuration in a
file named config.json
.
{ "TargetValue":
50.0
, "CustomizedMetricSpecification": { "MetricName": "CPUUtilization
", "Namespace": "/aws/sagemaker/Endpoints
", "Dimensions":[ {"Name": "EndpointName", "Value": "my-endpoint" }, {"Name": "VariantName","Value": "my-variant"} ]
, "Statistic": "Average
", "Unit": "Percent
" } }
For more information, see CustomizedMetricSpecification in the Application Auto Scaling API Reference.
Define a custom metric (CloudWatch metric: ExplanationsPerInstance)
When the endpoint has online explainability activated, it emits a
ExplanationsPerInstance
metric that outputs the average number of
records explained per minute, per instance, for a variant. The resource utilization
of explaining records can be more different than that of predicting records. We
strongly recommend using this metric for target tracking scaling of endpoints with
online explainability activated.
You can create multiple target tracking policies for a scalable target. Consider
adding the InvocationsPerInstance
policy from the Specify a predefined
metric (CloudWatch metric: InvocationsPerInstance) section (in addition
to the ExplanationsPerInstance
policy). If most invocations don't
return an explanation because of the threshold value set in the
EnableExplanations
parameter, then the endpoint can choose the
InvocationsPerInstance
policy. If there is a large number of
explanations, the endpoint can use the ExplanationsPerInstance
policy.
The following example configuration shows a target tracking scaling policy
with a custom metric. The policy scale adjusts the number of variant instances
so that each instance has an ExplanationsPerInstance
metric of 20.
Save this configuration in a file named config.json
.
{ "TargetValue":
20.0
, "CustomizedMetricSpecification": { "MetricName": "ExplanationsPerInstance
", "Namespace": "AWS/SageMaker
", "Dimensions":[ {"Name": "EndpointName", "Value": "my-endpoint" }, {"Name": "VariantName","Value": "my-variant"} ],
"Statistic": "Sum
" } }
For more information, see CustomizedMetricSpecification in the Application Auto Scaling API Reference.
Specify cooldown periods
You can optionally define cooldown periods in your target tracking scaling policy
by specifying the ScaleOutCooldown
and ScaleInCooldown
parameters.
The following is an example target tracking policy configuration for a variant
that keeps the average invocations per instance at 70. The policy configuration
provides a scale-in cooldown period of 10 minutes (600 seconds) and a scale-out
cooldown period of 5 minutes (300 seconds). Save this configuration in a file
named config.json
.
{ "TargetValue":
70.0
, "PredefinedMetricSpecification": { "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance" }, "ScaleInCooldown":600
, "ScaleOutCooldown":300
}
For more information, see TargetTrackingScalingPolicyConfiguration in the Application Auto Scaling API Reference.