Automatically Scale Amazon SageMaker Models
Amazon SageMaker supports automatic scaling (auto scaling) for your hosted models. Auto scaling dynamically adjusts the number of instances provisioned for a model in response to changes in your workload. When the workload increases, auto scaling brings more instances online. When the workload decreases, auto scaling removes unnecessary instances so that you don't pay for provisioned instances that you aren't using.
Topics
- Prerequisites
- Configure model auto scaling with the console
- Register a model
- Define a scaling policy
- Apply a scaling policy
- Edit a scaling policy
- Delete a scaling policy
- Query Endpoint Auto scaling History
- Update or delete endpoints that use automatic scaling
- Load testing your auto scaling configuration
- Use AWS CloudFormation to update auto scaling policies