Amazon SageMaker
Developer Guide

Deploy a Model

For an overview on deploying a model with Amazon SageMaker, see Deploy a Model in Amazon SageMaker.

Amazon SageMaker provides features to manage resources and optimize inference performance when deploying machine learning models. For guidance on using inference pipelines, Neo, Elastic inference, automatic scaling, inference hosting instances, and on best practices, see the following topics.

  • For guidance using Amazon SageMaker inference pipeline to manage data processing and real-time predictions or process batch transforms in a series of Docker containers: , see Deploy an Inference Pipeline.

  • For guidance using Amazon SageMaker Neo to enable machine learning models to train once and run anywhere in the cloud, see Amazon SageMaker Neo.

  • For guidance using Elastic Inference (EI) to speed up the throughput and decrease the latency of getting real-time inferences from your deep learning models that are deployed as Amazon SageMaker hosted models using a GPU instance for your endpoint, see Amazon SageMaker Elastic Inference (EI) .

  • For guidance on using automatic scaling to dynamically adjust the number of instances provisioned for a production variant in response to changes in your workload, see Automatically Scale Amazon SageMaker Models.

  • For information about the size of storage volumes on different sizes of hosting instances, see Hosting Instance Storage Volumes.

  • For guidance on best practices to use for model deployment, see Best Practices for Deploying Amazon SageMaker Models.