Deploy Models for Inference - Amazon SageMaker

Deploy Models for Inference

After you build and train your models, you can deploy them to get predictions in one of two ways:

  • To set up a persistent endpoint to get predictions from your models, use Amazon SageMaker hosting services. For an example of how to deploy a model to the SageMaker hosting service, see Create your endpoint and deploy your model.

    Or, if you prefer, watch the following video tutorial:

  • To get predictions for an entire dataset, use SageMaker batch transform. For an overview on deploying a model with SageMaker batch transform, see Use Batch Transform.

    For an example of how to deploy a model with batch transform, see (Optional) Make Prediction with Batch Transform.

    Or, if you prefer, watch the following video tutorial:


These topics assume that you have built and trained one or more machine learning models and are ready to deploy them. If you are new to SageMaker and have not completed these prerequisite tasks, work through the steps in the Get Started with Amazon SageMaker tutorial to familiarize yourself with an example of how SageMaker manages the data science process and how it handles model deployment. For more information about training a model, see Train Models.

What do you want to do?

SageMaker provides features to manage resources and optimize inference performance when deploying machine learning models. For guidance on using inference pipelines, compiling and deploying models with Neo, Elastic Inference, and automatic model scaling, see the following topics.

Manage Model Deployments

For guidance on managing model deployments, including monitoring, troubleshooting, and best practices, and for information on storage associated with inference hosting instances:

Deploy Your Own Inference Code

For developers that need more advanced guidance on how to run your own inference code: