Supported features
Amazon SageMaker AI offers the following four options to deploy models for inference.
-
Real-time inference for inference workloads with real-time, interactive, low latency requirements.
-
Batch transform for offline inference with large datasets.
-
Asynchronous inference for near-real-time inference with large inputs that require longer preprocessing times.
-
Serverless inference for inference workloads that have idle periods between traffic spurts.
The following table summarizes the core platform features that are supported by each inference option. It does not show features that can be provided by frameworks, custom Docker containers, or through chaining different AWS services.
Feature | Real-time inference | Batch transform | Asynchronous inference | Serverless inference | Docker containers |
---|---|---|---|---|---|
Autoscaling support | ✓ | N/A | ✓ | ✓ | N/A |
GPU support | ✓1 | ✓1 | ✓1 | 1P, pre-built, BYOC | |
Single model | ✓ | ✓ | ✓ | ✓ | N/A |
Multi-model endpoint | ✓ | k-NN, XGBoost, Linear Learner, RCF, TensorFlow, Apache MXNet, PyTorch, scikit-learn 2 | |||
Multi-container endpoint | ✓ | 1P, pre-built, Extend pre-built, BYOC | |||
Serial inference pipeline | ✓ | ✓ | 1P, pre-built, Extend pre-built, BYOC | ||
Inference Recommender | ✓ | 1P, pre-built, Extend pre-built, BYOC | |||
Private link support | ✓ | ✓ | ✓ | N/A | |
Data capture/Model monitor support | ✓ | ✓ | N/A | ||
DLCs supported |
1P, pre-built, Extend pre-built, BYOC | 1P, pre-built, Extend pre-built, BYOC | 1P, pre-built, Extend pre-built, BYOC | 1P, pre-built, Extend pre-built, BYOC | N/A |
Protocols supported | HTTP(S) | HTTP(S) | HTTP(S) | HTTP(S) | N/A |
Payload size | < 6 MB | ≤ 100 MB | ≤ 1 GB | ≤ 4 MB | |
HTTP chunked encoding | Framework dependent, 1P not supported | N/A | Framework dependent, 1P not supported | Framework dependent, 1P not supported | N/A |
Request timeout | < 60 seconds | Days | < 1 hour | < 60 seconds | N/A |
Deployment guardrails: blue/green deployments | ✓ | N/A | ✓ | N/A | |
Deployment guardrails: rolling deployments | ✓ | N/A | ✓ | N/A | |
Shadow testing | ✓ | N/A | |||
Scale to zero | N/A | ✓ | ✓ | N/A | |
Market place model packages support | ✓ | ✓ | N/A | ||
Virtual private cloud support | ✓ | ✓ | ✓ | N/A | |
Multiple production variants support | ✓ | N/A | |||
Network isolation | ✓ | ✓ | N/A | ||
Model parallel serving support | ✓3 | ✓ | ✓3 | ✓3 | |
Volume encryption | ✓ | ✓ | ✓ | ✓ | N/A |
Customer AWS KMS | ✓ | ✓ | ✓ | ✓ | N/A |
d instance support | ✓ | ✓ | ✓ | N/A | |
inf1 support | ✓ | ✓ |
With SageMaker AI, you can deploy a single model, or multiple models behind a single inference endpoint for real-time inference. The following table summarizes the core features supported by various hosting options that come with real-time inference.
Feature | Single model endpoints | Multi-model endpoints | Serial inference pipeline | Multi-container endpoints |
---|---|---|---|---|
Autoscaling support | ✓ | ✓ | ✓ | ✓ |
GPU support | ✓1 | ✓ | ✓ | |
Single model | ✓ | ✓ | ✓ | ✓ |
Multi-model endpoints | ✓ | ✓ | N/A | |
Multi-container endpoints | ✓ | N/A | ||
Serial inference pipeline | ✓ | ✓ | N/A | |
Inference Recommender | ✓ | |||
Private link support | ✓ | ✓ | ✓ | ✓ |
Data capture/Model monitor support | ✓ | N/A | N/A | N/A |
DLCs supported | 1P, pre-built, Extend pre-built, BYOC | k-NN, XGBoost, Linear Learner, RCF, TensorFlow, Apache MXNet, PyTorch, scikit-learn 2 | 1P, pre-built, Extend pre-built, BYOC | 1P, pre-built, Extend pre-built, BYOC |
Protocols supported | HTTP(S) | HTTP(S) | HTTP(S) | HTTP(S) |
Payload size | < 6 MB | < 6 MB | < 6 MB | < 6 MB |
Request timeout | < 60 seconds | < 60 seconds | < 60 seconds | < 60 seconds |
Deployment guardrails: blue/green deployments | ✓ | ✓ | ✓ | ✓ |
Deployment guardrails: rolling deployments | ✓ | ✓ | ✓ | ✓ |
Shadow testing | ✓ | |||
Market place model packages support | ✓ | |||
Virtual private cloud support | ✓ | ✓ | ✓ | ✓ |
Multiple production variants support | ✓ | ✓ | ✓ | |
Network isolation | ✓ | ✓ | ✓ | ✓ |
Model parallel serving support | ✓ 3 | ✓ 3 | ||
Volume encryption | ✓ | ✓ | ✓ | ✓ |
Customer AWS KMS | ✓ | ✓ | ✓ | ✓ |
d instance support | ✓ | ✓ | ✓ | ✓ |
inf1 support | ✓ |
1 Availability of the Amazon EC2 instance types depends on the AWS Region. For
availability of instances specific to AWS, see Amazon SageMaker AI
Pricing
2 To use any other framework or algorithm, use the SageMaker AI Inference toolkit to build a container that supports multi-model endpoints.
3 With SageMaker AI, you can deploy large models (up to 500 GB) for
inference. You can configure the container health check and download timeout quotas, up to
60 minutes. This will allow you to have more time to download and load your model and
associated resources. For more information, see SageMaker AI endpoint parameters for large model inference.
You can use SageMaker AI compatible large model Inference containers