Real-time inference

Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed and support autoscaling (see Automatically Scale Amazon SageMaker Models).

Topics

Deploy models for real-time inference
Invoke models for real-time inference
Manage your endpoints
Hosting options
Automatically Scale Amazon SageMaker Models
Host instance storage volumes
Safely validate models in production
Online Explainability with SageMaker Clarify

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Troubleshoot Inference Recommender errors

Deploy models