Update models in production
Deployment guardrails are a set of model deployment options in Amazon SageMaker Inference to update your machine learning models in production. Using the fully managed deployment options, you can control the switch from the current model in production to a new one. Traffic shifting modes, such as canary and linear, give you granular control over the traffic shifting process from your current model to the new one during the course of the update. There are also built-in safeguards such as auto-rollbacks that help you catch issues early and automatically take corrective action before they significantly impact production.
Deployment guardrails provide the following benefits:
Deployment safety while updating production environments. A regressive update to a production environment can cause unplanned downtime and business impact, such as increased model latency and high error rates. Deployment guardrails help you mitigate those risks by providing best practices and built-in operational safety guardrails.
Fully managed deployment. SageMaker takes care of setting up and orchestrating these deployments and integrates them with endpoint update mechanisms. You do not need to build and maintain orchestration, monitoring, or rollback mechanisms. You can leverage SageMaker to set up and orchestrate these deployments and focus on leveraging ML for your applications.
Visibility. You can track the progress of your deployment through the DescribeEndpoint API or through Amazon CloudWatch Events (for supported endpoints). To learn more about events in SageMaker, see the Endpoint deployment state change section in Automating Amazon SageMaker with Amazon EventBridge. Note that if your endpoint uses any of the features in the Exclusions page, you cannot use CloudWatch Events.
Note
Deployment guardrails only apply to Asynchronous inference and Real-time inference endpoint types.
How to Get Started
We support blue/green deployments with multiple traffic shifting modes. A traffic shifting mode is a configuration that specifies how SageMaker routes endpoint traffic to a new fleet containing your updates. The following traffic shifting modes provide you with different levels of control over the endpoint update process:
-
Blue/Green: All At Once shifts all of your endpoint traffic from the blue fleet to the green fleet. Once the traffic shifts to the green fleet, your pre-specified Amazon CloudWatch alarms begin monitoring the green fleet for a set amount of time (the baking period). If no alarms trip during the baking period, then SageMaker terminates the blue fleet.
-
Blue/Green: Canary lets you shift one small portion of your traffic (a canary) to the green fleet and monitor it for a baking period. If the canary succeeds on the green fleet, then SageMaker shifts the rest of the traffic from the blue fleet to the green fleet before terminating the blue fleet.
-
Blue/Green: Linear provides even more customization over the number of traffic-shifting steps and the percentage of traffic to shift for each step. While canary shifting lets you shift traffic in two steps, linear shifting extends this to n linearly spaced steps.
You can create and manage your deployment through the UpdateEndpoint and CreateEndpoint SageMaker API and AWS Command Line Interface commands. See the individual deployment pages for more details on how to set up your deployment. Note that if your endpoint uses any of the features listed in the Exclusions page, you cannot use deployment guardrails.
To follow guided examples that shows how to use deployment guardrails, see our example
Jupyter notebooks