Update or delete endpoints that use automatic scaling
Topics
Update endpoints that use automatic scaling
When you update an endpoint, Application Auto Scaling checks to see whether any of the models on that endpoint are targets for automatic scaling. If the update would change the instance type for any model that is a target for automatic scaling, the update fails.
In the AWS Management Console, you see a warning that you must deregister the model
from
automatic scaling before you can update it. If you are trying to update the endpoint
by calling the
UpdateEndpoint
API, the call fails. Before you update the
endpoint, delete any scaling policies configured for it by calling the DeleteScalingPolicy Application Auto Scaling API action, then call DeregisterScalableTarget to deregister the variant as a scalable
target. After you update the endpoint, you can register the variant as a scalable
target and attach an automatic scaling policy to the updated variant.
There is one exception. If you change the model for a variant that is configured for automatic scaling, Amazon SageMaker automatic scaling allows the update. This is because changing the model doesn't typically affect performance enough to change automatic scaling behavior. If you do update a model for a variant configured for automatic scaling, ensure that the change to the model doesn't significantly affect performance and automatic scaling behavior.
When you update SageMaker endpoints that have automatic scaling applied, complete the following steps:
To update an endpoint that has automatic scaling applied
-
Deregister the endpoint as a scalable target by calling DeregisterScalableTarget.
-
Because automatic scaling is blocked while the update operation is in progress (or if you turned off automatic scaling in the previous step), you might want to take the additional precaution of increasing the number of instances for your endpoint during the update. To do this, update the instance counts for the production variants hosted at the endpoint by calling
UpdateEndpointWeightsAndCapacities
. -
Call
DescribeEndpoint
repeatedly until the value of theEndpointStatus
field of the response isInService
. -
Call
DescribeEndpointConfig
to get the values of the current endpoint config. -
Create a new endpoint config by calling
CreateEndpointConfig
. For the production variants where you want to keep the existing instance count or weight, use the same variant name from the response from the call toDescribeEndpointConfig
in the previous step. For all other values, use the values that you got as the response when you calledDescribeEndpointConfig
in the previous step. -
Update the endpoint by calling
UpdateEndpoint
. Specify the endpoint config you created in the previous step as theEndpointConfig
field. If you want to retain the variant properties like instance count or weight, set the value of theRetainAllVariantProperties
parameter toTrue
. This specifies that production variants with the same name will are updated with the most recentDesiredInstanceCount
from the response from the call toDescribeEndpoint
, regardless of the values of theInitialInstanceCount
field in the newEndpointConfig
. -
(Optional) Re-enable automatic scaling by calling RegisterScalableTarget.
Steps 1 and 7 are required only if you are updating an endpoint with the following changes:
-
Changing the instance type for a production variant that has automatic scaling configured
-
Removing a production variant that has automatic scaling configured.
Delete endpoints configured for automatic scaling
If you delete an endpoint, Application Auto Scaling checks to see whether any of the models on that endpoint are targets for automatic scaling. If any are and you have permission to deregister the model, Application Auto Scaling deregisters those models as scalable targets without notifying you. If you use a custom permission policy that doesn't provide permission for the DeleteScalingPolicy and DeregisterScalableTarget actions, you must delete automatic scaling policies and deregister scalable targets and before deleting the endpoint.
You, as an IAM user, might not have sufficient permission to delete an endpoint if another IAM user configured automatic scaling for a variant on that endpoint.