Deploy a Model - Amazon SageMaker

Deploy a Model

To deploy an Amazon SageMaker Neo-compiled model to an HTTPS endpoint, you must configure and create the endpoint for the model using Amazon SageMaker hosting services. Currently, developers can use Amazon SageMaker APIs to deploy modules on to ml.c5, ml.c4, ml.m5, ml.m4, ml.p3, ml.p2, and ml.inf1 instances.

For Inferentia and Trainium instances, models need to be compiled specifically for those instances. Models compiled for other instance types are not guaranteed to work with Inferentia or Trainium instances.

For Elastic Inference accelerators, models need to be compiled specifically for ml_eia2 devices. For information on how to deploy your compiled model to an Elastic Inference accelerator, see Use EI on Amazon SageMaker Hosted Endpoints.

When you deploy a compiled model, you need to use the same instance for the target that you used for compilation. This creates a SageMaker endpoint that you can use to perform inferences. You can deploy a Neo-compiled model using any of the following: Amazon SageMaker SDK for Python, SDK for Python (Boto3), AWS Command Line Interface, and the SageMaker console.

Note

For deploying a model using AWS CLI, the console, or Boto3, see Neo Inference Container Images to select the inference image URI for your primary container.