Amazon SageMaker
Developer Guide

Amazon SageMaker Inference Pipelines

An inference pipeline is an Amazon SageMaker model composed of a linear sequence of two to five containers that process requests for inferences on data. Amazon SageMaker Inference Pipelines enable the definition and deployment of any combination of pretrained Amazon SageMaker built-in algorithms and your own custom algorithms packaged in Docker containers. These inference pipelines are fully managed and can combine preprocessing, predictions, and post-processing as part of a data science process. Pipeline model invocations are handled as a sequence of HTTP requests. The initial request is handled by the first container in the pipeline, then the intermediate response is sent as a request to the second container, and so on, for each container in the pipeline with the final response returned to the client. In particular, you can add SparkML Model Serving and scikit-learn containers that reuse the data-transformers developed for training models. The entire assembled inference pipeline itself constitutes an Amazon SageMaker model that can be used for making either real-time predictions or for processing batch transforms directly without any external pre-processing.

When the pipeline model is deployed, the full set of containers installs and runs on each EC2 instance in the endpoint or transform job. Feature processing and inferences are executed with low latency because the containers deployed in an inference pipeline are co-located on the same EC2 instance. You define the containers for a pipeline model using the CreateModel. Instead of setting one PrimaryContainer, you now can set multiple Containers that make up the pipeline. You specify the order in which the containers are executed when creating the inference pipeline model. Although the pipeline model is immutable, you can update the inference pipeline by redeploying a new one using the regular UpdateEndpoint process. This provides modularity in the development of your machine learning workflows and greater flexibility during experimentation. There are no additional costs for using this feature and you only pay for the instances running behind an endpoint.

Sample Notebooks for Amazon SageMaker Inference Pipelines

For an end-to-end sample that uploads and processes a dataset, trains a model, and builds a pipeline model, see the Inference Pipelines with SparkML and XGBoost on Abalone sample notebook. This notebook demonstrates how you can build your machine learning pipeline by using Spark feature Transformers and the Amazon SageMaker XGBoost algorithm. When the model has been trained, the sample shows how to deploy the pipeline (Feature Transformer and XGBoost) for real-time predictions and also performs a batch transform job against the same pipeline. For additional examples that create and deploy inference pipelines, see the Inference Pipelines with SparkML and BlazingText on DBPedia and Training using SparkML on EMR and hosting on SageMaker sample notebooks. For instructions how to create and access Jupyter notebook instances that you can use to run the example in Amazon SageMaker, see Use Notebook Instances. Once you have created a notebook instance and opened it, choose the SageMaker Examples tab to see a list of all the Amazon SageMaker samples. The first two inference pipeline notebooks are located in the advanced_functionality folder and the third notebook is in the sagemaker-python-sdk folder. To open a notebook, choose its Use tab and choose Create copy.

On this page: