Get an instance recommendation
Instance recommendation jobs run a set of load tests on recommended instance types. Inference recommendation jobs use performance metrics that are based on load tests using the sample data you provided during model version registration.
Note
Before you create an Inference Recommender recommendation job, make sure you have satisfied the Prerequisites.
The following demonstrates how to use Amazon SageMaker Inference Recommender to create an instance recommendation based on your model type using the AWS SDK for Python (Boto3), AWS CLI, and Amazon SageMaker Studio.
Create an instance recommendation
Create an instance recommendation programmaticially using AWS SDK for Python (Boto3), with
the AWS CLI, or interactively using Studio. Specify a job name for your
instance recommendation, an AWS IAM role ARN, an input configuration, and
either a model package ARN when you registered your model with Model Registry,
or your model name and a ContainerConfig
dictionary from when you
created your model in the Prerequisites section.
- AWS SDK for Python (Boto3)
-
Use the
CreateInferenceRecommendationsJob
API to get an instance endpoint recommendation. Set theJobType
field to'Default'
for instance endpoint recommendation jobs. In addition, provide the following:The Amazon Resource Name (ARN) of an IAM role that enables Inference Recommender to perform tasks on your behalf. Define this for the
RoleArn
field.-
Inference Recommender supports either one model package ARN or a model name as input. Specify one of the following:
The ARN of the versioned model package you created when you registered your model with the model registry. Define this for
ModelPackageVersionArn
in theInputConfig
field.The name of the model you created. Define this for
ModelName
in theInputConfig
field. Also, provide theContainerConfig
dictionary, which includes the required fields that need to be provided with the model name. Define this forContainerConfig
in theInputConfig
field.
Provide a name for your Inference Recommender recommendation job for the
JobName
field. The Inference Recommender job name must be unique within the AWS Region and within your AWS account.
Import the AWS SDK for Python (Boto3) package and create a SageMaker client object using the client class. If you followed the steps in the Prerequisites section, only specify one of the following:
Option 1: If you would like to create an inference recommendations job with a model package ARN, then store the model package group ARN in a variable named
model_package_arn
.Option 2: If you would like to create an inference recommendations job with a model name and
ContainerConfig
, store the model name in a variable namedmodel_name
and theContainerConfig
dictionary in a variable namedcontainer_config
.
# Create a low-level SageMaker service client. import boto3 aws_region =
'<INSERT>'
sagemaker_client = boto3.client('sagemaker', region_name=aws_region) # Provide only one of model package ARN or model name, not both. # Provide your model package ARN that was created when you registered your # model with Model Registry model_package_arn = '<INSERT>' ## Uncomment if you would like to create an inference recommendations job with a ## model name instead of a model package ARN, and comment out model_package_arn above ## Provide your model name # model_name = '<INSERT>' ## Provide your contaienr config # container_config = '<INSERT>' # Provide a unique job name for SageMaker Inference Recommender job job_name ='<INSERT>'
# Inference Recommender job type. Set to Default to get an initial recommendation job_type = 'Default' # Provide an IAM Role that gives SageMaker Inference Recommender permission to # access AWS services role_arn ='arn:aws:iam::<account>:role/*'
sagemaker_client.create_inference_recommendations_job( JobName = job_name, JobType = job_type, RoleArn = role_arn, # Provide only one of model package ARN or model name, not both. # If you would like to create an inference recommendations job with a model name, # uncomment ModelName and ContainerConfig, and comment out ModelPackageVersionArn. InputConfig = { 'ModelPackageVersionArn': model_package_arn # 'ModelName': model_name, # 'ContainerConfig': container_config } )See the Amazon SageMaker API Reference Guide for a full list of optional and required arguments you can pass to
CreateInferenceRecommendationsJob
. - AWS CLI
-
Use the
create-inference-recommendations-job
API to get an instance endpoint recommendation. Set thejob-type
field to'Default'
for instance endpoint recommendation jobs. In addition, provide the following:The Amazon Resource Name (ARN) of an IAM role that enables Amazon SageMaker Inference Recommender to perform tasks on your behalf. Define this for the
role-arn
field.-
Inference Recommender supports either one model package ARN or a model name as input. Specify one of the following
The ARN of the versioned model package you created when you registered your model with Model Registry. Define this for
ModelPackageVersionArn
in theinput-config
field.The name of the model you created. Define this for
ModelName
in theinput-config
field. Also, provide theContainerConfig
dictionary which includes the required fields that need to be provided with the model name. Define this forContainerConfig
in theinput-config
field.
Provide a name for your Inference Recommender recommendation job for the
job-name
field. The Inference Recommender job name must be unique within the AWS Region and within your AWS account.
To create an inference recommendation jobs with a model package ARN, use the following example:
aws sagemaker create-inference-recommendations-job --region
<region>
\ --job-name<job_name>
\ --job-type Default\ --role-arn arn:aws:iam::<account:role/*>
\ --input-config "{ \"ModelPackageVersionArn\": \"arn:aws:sagemaker:<region:account:role/*>
\", }"To create an inference recommendation jobs with a model name and
ContainerConfig
, use the following example:aws sagemaker create-inference-recommendations-job --region
<region>
\ --job-name<job_name>
\ --job-type Default\ --role-arn arn:aws:iam::<account:role/*>
\ --input-config "{ \"ModelName\": \"model-name\", \"ContainerConfig\" : { \"Domain\": \"COMPUTER_VISION\", \"Framework\": \"PYTORCH\", \"FrameworkVersion\": \"1.7.1\", \"NearestModelName\": \"resnet18\", \"PayloadConfig\": { \"SamplePayloadUrl\": \"s3://{bucket}/{payload_s3_key}\", \"SupportedContentTypes\": [\"image/jpeg\"] }, \"DataInputConfig\": \"[[1,3,256,256]]\", \"Task\": \"IMAGE_CLASSIFICATION\", }, }" - Amazon SageMaker Studio
-
Create an instance recommendation job in Studio.
-
In your Studio application, choose the home icon (
).
-
In the left sidebar of Studio, choose Models.
Choose Model Registry from the dropdown list to display models you have registered with the model registry.
The left panel displays a list of model groups. The list includes all the model groups registered with the model registry in your account, including models registered outside of Studio.
Select the name of your model group. When you select your model group, the right pane of Studio displays column heads such as Versions and Setting.
If you have one or more model packages within your model group, you will see a list of those model packages within the Versions column.
Choose the Inference recommender column.
Choose an IAM role that grants Inference Recommender permission to access AWS services. You can create a role and attach the
AmazonSageMakerFullAccess
IAM managed policy to accomplish this. Or you can let Studio create a role for you.Choose Get recommendations.
The instance recommendation can take up to 45 minutes.
Warning
Do not close this tab. If you close this tab, you cancel the instance recommendation job.
-
Get Your Instance Recommendation Job Results
Collect the results of your instance recommendation job programmatically with AWS SDK for Python (Boto3), the AWS CLI, or Studio.
- AWS SDK for Python (Boto3)
-
Once an instance recommendation is complete, you can use
DescribeInferenceRecommendationsJob
to get the job details and recommended instance types. Provide the job name that you used when you created the instance recommendation job.job_name=
'<INSERT>'
response = sagemaker_client.describe_inference_recommendations_job( JobName=job_name)Print the response object. The previous code sample stored the response in a variable name
response
.print(response['Status'])
This returns a JSON response similar to the following:
{ 'JobName':
'job-name'
, 'JobDescription':'job-description'
, 'JobType': 'Default', 'JobArn': 'arn:aws:sagemaker:region
:account-id
:inference-recommendations-job/resource-id
', 'Status': 'COMPLETED', 'CreationTime': datetime.datetime(2021, 10, 26, 20, 4, 57, 627000, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2021, 10, 26, 20, 25, 1, 997000, tzinfo=tzlocal()), 'InputConfig': { 'ModelPackageVersionArn': 'arn:aws:sagemaker:region
:account-id
:model-package/resource-id
', 'JobDurationInSeconds': 0 }, 'InferenceRecommendations': [{ 'Metrics': { 'CostPerHour': 0.20399999618530273, 'CostPerInference': 5.246913588052848e-06, 'MaximumInvocations': 648, 'ModelLatency': 263596 }, 'EndpointConfiguration': { 'EndpointName':'endpoint-name'
, 'VariantName':'variant-name'
, 'InstanceType': 'ml.c5.xlarge', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }, { 'Metrics': { 'CostPerHour': 0.11500000208616257, 'CostPerInference': 2.92620870823157e-06, 'MaximumInvocations': 655, 'ModelLatency': 826019 }, 'EndpointConfiguration': { 'EndpointName':'endpoint-name'
, 'VariantName':'variant-name'
, 'InstanceType': 'ml.c5d.large', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }, { 'Metrics': { 'CostPerHour': 0.11500000208616257, 'CostPerInference': 3.3625731248321244e-06, 'MaximumInvocations': 570, 'ModelLatency': 1085446 }, 'EndpointConfiguration': { 'EndpointName':'endpoint-name'
, 'VariantName':'variant-name'
, 'InstanceType': 'ml.m5.large', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }], 'ResponseMetadata': { 'RequestId':'request-id'
, 'HTTPStatusCode': 200, 'HTTPHeaders': { 'x-amzn-requestid':'x-amzn-requestid'
, 'content-type':'content-type'
, 'content-length': '1685', 'date': 'Tue, 26 Oct 2021 20:31:10 GMT' }, 'RetryAttempts': 0 } }The first few lines provide information about the instance recommendation job itself. This includes the job name, role ARN, and creation and deletion times.
The
InferenceRecommendations
dictionary contains a list of Inference Recommender instance recommendations.The
EndpointConfiguration
nested dictionary contains the instance type (InstanceType
) recommendation along with the endpoint and variant name (a deployed AWS machine learning model) that was used during the recommendation job. You can use the endpoint and variant name for monitoring in Amazon CloudWatch Events. See Monitor Amazon SageMaker with Amazon CloudWatch for more information.The
Metrics
nested dictionary contains information about the estimated cost per hour (CostPerHour
) for your real-time endpoint in US dollars, the estimated cost per inference (CostPerInference
) in US dollars for your real-time endpoint, the expected maximum number ofInvokeEndpoint
requests per minute sent to the endpoint (MaxInvocations
), and the model latency (ModelLatency
), which is the interval of time (in milliseconds) that your model took to respond to SageMaker. The model latency includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container. - AWS CLI
-
Once an instance recommendation is complete, you can use
describe-inference-recommendations-job
to get the job details and recommended instance types. Provide the job name that you used when you created the instance recommendation job.aws sagemaker describe-inference-recommendations-job\ --job-name
<job-name>
\ --region<aws-region>
The JSON response similar should resemble the following:
{ 'JobName':
'job-name'
, 'JobDescription':'job-description'
, 'JobType': 'Default', 'JobArn': 'arn:aws:sagemaker:region
:account-id
:inference-recommendations-job/resource-id
', 'Status': 'COMPLETED', 'CreationTime': datetime.datetime(2021, 10, 26, 20, 4, 57, 627000, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2021, 10, 26, 20, 25, 1, 997000, tzinfo=tzlocal()), 'InputConfig': { 'ModelPackageVersionArn': 'arn:aws:sagemaker:region
:account-id
:model-package/resource-id
', 'JobDurationInSeconds': 0 }, 'InferenceRecommendations': [{ 'Metrics': { 'CostPerHour': 0.20399999618530273, 'CostPerInference': 5.246913588052848e-06, 'MaximumInvocations': 648, 'ModelLatency': 263596 }, 'EndpointConfiguration': { 'EndpointName':'endpoint-name'
, 'VariantName':'variant-name'
, 'InstanceType': 'ml.c5.xlarge', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }, { 'Metrics': { 'CostPerHour': 0.11500000208616257, 'CostPerInference': 2.92620870823157e-06, 'MaximumInvocations': 655, 'ModelLatency': 826019 }, 'EndpointConfiguration': { 'EndpointName':'endpoint-name'
, 'VariantName':'variant-name'
, 'InstanceType': 'ml.c5d.large', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }, { 'Metrics': { 'CostPerHour': 0.11500000208616257, 'CostPerInference': 3.3625731248321244e-06, 'MaximumInvocations': 570, 'ModelLatency': 1085446 }, 'EndpointConfiguration': { 'EndpointName':'endpoint-name'
, 'VariantName':'variant-name'
, 'InstanceType': 'ml.m5.large', 'InitialInstanceCount': 1 }, 'ModelConfiguration': { 'Compiled': False, 'EnvironmentParameters': [] } }], 'ResponseMetadata': { 'RequestId':'request-id'
, 'HTTPStatusCode': 200, 'HTTPHeaders': { 'x-amzn-requestid':'x-amzn-requestid'
, 'content-type':'content-type'
, 'content-length': '1685', 'date': 'Tue, 26 Oct 2021 20:31:10 GMT' }, 'RetryAttempts': 0 } }The first few lines provide information about the instance recommendation job itself. This includes the job name, role ARN, creation, and deletion time.
The
InferenceRecommendations
dictionary contains a list of Inference Recommender instance recommendations.The
EndpointConfiguration
nested dictionary contains the instance type (InstanceType
) recommendation along with the endpoint and variant name (a deployed AWS machine learning model) used during the recommendation job. You can use the endpoint and variant name for monitoring in Amazon CloudWatch Events. See Monitor Amazon SageMaker with Amazon CloudWatch for more information.The
Metrics
nested dictionary contains information about the estimated cost per hour (CostPerHour
) for your real-time endpoint in US dollars, the estimated cost per inference (CostPerInference
) in US dollars for your real-time endpoint, the expected maximum number ofInvokeEndpoint
requests per minute sent to the endpoint (MaxInvocations
), and the model latency (ModelLatency
), which is the interval of time (in milliseconds) that your model took to respond to SageMaker. The model latency includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.For more information about interpreting the results of your recommendation job, see Interpret recommendation results.
- Amazon SageMaker Studio
-
The instance recommendations populate in a new Inference recommendations tab within Studio. It can take up to 45 minutes for the results to show up. This tab contains Results and Details column headings.
The Details column provides information about the instance recommendation job, such as the name of the instance recommendation, when the job was created (Creation time), and more. It also provides Settings information, such as the maximum number of invocations that occurred per minute and information about the Amazon Resource Names used.
The Results column provides a Deployment goals and SageMaker recommendations window in which you can adjust the order that the results are displayed based on deployment importance. There are three dropdown menus that you can use to provide the level of importance of the Cost, Latency, and Throughput for your use case. For each goal (cost, latency, and throughput), you can set the level of importance: Lowest Importance, Low Importance, Moderate importance, High importance, or Highest importance.
Based on your selections of importance for each goal, Inference Recommender displays its top recommendation in the SageMaker recommendation field on the right of the panel, along with the estimated cost per hour and inference request. It also provides information about the expected model latency, maximum number of invocations, and the number of instances.
In addition to the top recommendation displayed, you can also see the same information displayed for all instances that Inference Recommender tested in the All runs section.
Stop Your Instance Endpoint Recommendation
Stop your Inference Recommender instance recommendation jobs programmatically with the
StopInferenceRecommendationsJob
API or with Studio.
- AWS SDK for Python (Boto3)
Specify the name of the instance recommendation job for the
JobName
field:sagemaker_client.stop_inference_recommendations_job( JobName=
'<INSERT>'
)- AWS CLI
Specify the job name of the instance recommendation job for the
job-name
flag:aws sagemaker stop-inference-recommendations-job --job-name
<job-name>
- Amazon SageMaker Studio
Close the tab in which you initiated the instance recommendation to stop your Inference Recommender instance recommendation.