Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Recommendation jobs with Amazon SageMaker Inference Recommender

Focus mode
Recommendation jobs with Amazon SageMaker Inference Recommender - Amazon SageMaker AI

Amazon SageMaker Inference Recommender can make two types of recommendations:

  1. Inference recommendations (Default job type) run a set of load tests on the recommended instance types. You can also load test for a serverless endpoint.. You only need to provide a model package Amazon Resource Name (ARN) to launch this type of recommendation job. Inference recommendation jobs complete within 45 minutes.

  2. Endpoint recommendations (Advanced job type) are based on a custom load test where you select your desired ML instances or a serverless endpoint, provide a custom traffic pattern, and provide requirements for latency and throughput based on your production requirements. This job takes an average of 2 hours to complete depending on the job duration set and the total number of inference configurations tested.

Both types of recommendations use the same APIs to create, describe, and stop jobs. The output is a list of instance configuration recommendations with associated environment variables, cost, throughput, and latency metrics. Recommendation jobs also provide an initial instance count, which you can use to configure an autoscaling policy. To differentiate between the two types of jobs, when you’re creating a job through either the SageMaker AI console or the APIs, specify Default to create preliminary endpoint recommendations and Advanced for custom load testing and endpoint recommendations.

Note

You do not need to do both types of recommendation jobs in your own workflow. You can do either independently of the other.

Inference Recommender can also provide you with a list of prospective instances, or the top five instance types that are optimized for cost, throughput and latency for model deployment, along with a confidence score. You can choose these instances when deploying your model. Inference Recommender automatically performs benchmarking against your model for you to provide the prospective instances. Since these are preliminary recommendations, we recommend that you run further instance recommendation jobs to get more accurate results. To view the prospective instances, go to your SageMaker AI model details page. For more information, see Get instant prospective instances.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.