Supported features - Amazon SageMaker AI

Supported features

Amazon SageMaker AI offers the following four options to deploy models for inference.

  • Real-time inference for inference workloads with real-time, interactive, low latency requirements.

  • Batch transform for offline inference with large datasets.

  • Asynchronous inference for near-real-time inference with large inputs that require longer preprocessing times.

  • Serverless inference for inference workloads that have idle periods between traffic spurts.

The following table summarizes the core platform features that are supported by each inference option. It does not show features that can be provided by frameworks, custom Docker containers, or through chaining different AWS services.

Feature Real-time inference Batch transform Asynchronous inference Serverless inference Docker containers
Autoscaling support N/A N/A
GPU support 1 1 1 1P, pre-built, BYOC
Single model N/A
Multi-model endpoint k-NN, XGBoost, Linear Learner, RCF, TensorFlow, Apache MXNet, PyTorch, scikit-learn 2
Multi-container endpoint 1P, pre-built, Extend pre-built, BYOC
Serial inference pipeline 1P, pre-built, Extend pre-built, BYOC
Inference Recommender 1P, pre-built, Extend pre-built, BYOC
Private link support N/A
Data capture/Model monitor support N/A
DLCs supported 1P, pre-built, Extend pre-built, BYOC 1P, pre-built, Extend pre-built, BYOC 1P, pre-built, Extend pre-built, BYOC 1P, pre-built, Extend pre-built, BYOC N/A
Protocols supported HTTP(S) HTTP(S) HTTP(S) HTTP(S) N/A
Payload size < 6 MB ≤ 100 MB ≤ 1 GB ≤ 4 MB
HTTP chunked encoding Framework dependent, 1P not supported N/A Framework dependent, 1P not supported Framework dependent, 1P not supported N/A
Request timeout < 60 seconds Days < 1 hour < 60 seconds N/A
Deployment guardrails: blue/green deployments N/A N/A
Deployment guardrails: rolling deployments N/A N/A
Shadow testing N/A
Scale to zero N/A N/A
Market place model packages support N/A
Virtual private cloud support N/A
Multiple production variants support N/A
Network isolation N/A
Model parallel serving support 3 3 3
Volume encryption N/A
Customer AWS KMS N/A
d instance support N/A
inf1 support

With SageMaker AI, you can deploy a single model, or multiple models behind a single inference endpoint for real-time inference. The following table summarizes the core features supported by various hosting options that come with real-time inference.

Feature Single model endpoints Multi-model endpoints Serial inference pipeline Multi-container endpoints
Autoscaling support
GPU support 1
Single model
Multi-model endpoints N/A
Multi-container endpoints N/A
Serial inference pipeline N/A
Inference Recommender
Private link support
Data capture/Model monitor support N/A N/A N/A
DLCs supported 1P, pre-built, Extend pre-built, BYOC k-NN, XGBoost, Linear Learner, RCF, TensorFlow, Apache MXNet, PyTorch, scikit-learn 2 1P, pre-built, Extend pre-built, BYOC 1P, pre-built, Extend pre-built, BYOC
Protocols supported HTTP(S) HTTP(S) HTTP(S) HTTP(S)
Payload size < 6 MB < 6 MB < 6 MB < 6 MB
Request timeout < 60 seconds < 60 seconds < 60 seconds < 60 seconds
Deployment guardrails: blue/green deployments
Deployment guardrails: rolling deployments
Shadow testing
Market place model packages support
Virtual private cloud support
Multiple production variants support
Network isolation
Model parallel serving support 3 3
Volume encryption
Customer AWS KMS
d instance support
inf1 support

1 Availability of the Amazon EC2 instance types depends on the AWS Region. For availability of instances specific to AWS, see Amazon SageMaker AI Pricing.

2 To use any other framework or algorithm, use the SageMaker AI Inference toolkit to build a container that supports multi-model endpoints.

3 With SageMaker AI, you can deploy large models (up to 500 GB) for inference. You can configure the container health check and download timeout quotas, up to 60 minutes. This will allow you to have more time to download and load your model and associated resources. For more information, see SageMaker AI endpoint parameters for large model inference. You can use SageMaker AI compatible large model Inference containers. You can also use third-party model parallelization libraries, such as Triton with FasterTransformer and DeepSpeed. You have to ensure that they are compatible with SageMaker AI.