Real-time Inference - Amazon SageMaker

Real-time Inference

Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are fully managed, support autoscaling (see Automatically Scale Amazon SageMaker Models), and can be deployed in multiple Availability Zones.

You can create a real-time inference endpoint using the AWS SDK for Python (Boto3) or the AWS CLI.