Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Best practices

Focus mode
Best practices - Amazon SageMaker AI

When creating an inference experiment, keep the following information in mind:

  • Traffic sampling percentage – Sampling 100 percent of the inference requests lets you validate that your shadow variant can handle production traffic when promoted. You may start off with a lower traffic sampling percentage and dial up as you gain confidence in your variant, but it is best practice to ensure that you’ve increased the traffic to 100 percent prior to promotion.

  • Instance type – Unless you are using shadow variants to evaluate alternate instance types or sizes, we recommend that you use the same instance type, size, and count so that you can be certain that your shadow variant can handle the volume of inference requests after you promote it.

  • Auto scaling – To ensure that your shadow variant can respond to spikes in the number of inference requests or changes in inference requests patterns, we highly recommend that you configure autoscaling on your shadow variants. To learn how to configure autoscaling, see Automatic scaling of Amazon SageMaker AI models. If you have configured autoscaling, you can also validate changes to autoscaling policies without causing impact to users.

  • Metrics monitoring – After you initiate a shadow experiment and have sufficient invocations, monitor the metrics dashboard to ensure that the metrics such as latency and error rate are within acceptable bounds. This helps you catch misconfigurations early and take corrective action. For information about how to monitor the metrics of an in-progress inference experiment, see How to view, monitor, and edit shadow tests.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.