Recommendation jobs with Amazon SageMaker Inference Recommender

Amazon SageMaker Inference Recommender can make two types of recommendations:

Inference recommendations (Default job type) run a set of load tests on the recommended instance types. You can also load test for a serverless endpoint.. You only need to provide a model package Amazon Resource Name (ARN) to launch this type of recommendation job. Inference recommendation jobs complete within 45 minutes.
Endpoint recommendations (Advanced job type) are based on a custom load test where you select your desired ML instances or a serverless endpoint, provide a custom traffic pattern, and provide requirements for latency and throughput based on your production requirements. This job takes an average of 2 hours to complete depending on the job duration set and the total number of inference configurations tested.

Both types of recommendations use the same APIs to create, describe, and stop jobs. The output is a list of instance configuration recommendations with associated environment variables, cost, throughput, and latency metrics. Recommendation jobs also provide an initial instance count, which you can use to configure an autoscaling policy. To differentiate between the two types of jobs, when you’re creating a job through either the SageMaker AI console or the APIs, specify Default to create preliminary endpoint recommendations and Advanced for custom load testing and endpoint recommendations.

Note

You do not need to do both types of recommendation jobs in your own workflow. You can do either independently of the other.

Inference Recommender can also provide you with a list of prospective instances, or the top five instance types that are optimized for cost, throughput and latency for model deployment, along with a confidence score. You can choose these instances when deploying your model. Inference Recommender automatically performs benchmarking against your model for you to provide the prospective instances. Since these are preliminary recommendations, we recommend that you run further instance recommendation jobs to get more accurate results. To view the prospective instances, go to your SageMaker AI model details page. For more information, see Get instant prospective instances.

Topics

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Prerequisites

Get instant prospective instances