Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Asynchronous inference

Focus mode
Asynchronous inference - Amazon SageMaker AI

Amazon SageMaker Asynchronous Inference is a capability in SageMaker AI that queues incoming requests and processes them asynchronously. This option is ideal for requests with large payload sizes (up to 1GB), long processing times (up to one hour), and near real-time latency requirements. Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, so you only pay when your endpoint is processing requests.

How It Works

Creating an asynchronous inference endpoint is similar to creating real-time inference endpoints. You can use your existing SageMaker AI models and only need to specify the AsyncInferenceConfig object while creating your endpoint configuration with the EndpointConfig field in the CreateEndpointConfig API. The following diagram shows the architecture and workflow of Asynchronous Inference.

Architecture diagram of Asynchronous Inference showing how a user invokes an endpoint.

To invoke the endpoint, you need to place the request payload in Amazon S3. You also need to provide a pointer to this payload as a part of the InvokeEndpointAsync request. Upon invocation, SageMaker AI queues the request for processing and returns an identifier and output location as a response. Upon processing, SageMaker AI places the result in the Amazon S3 location. You can optionally choose to receive success or error notifications with Amazon SNS. For more information about how to set up asynchronous notifications, see Check prediction results.

Note

The presence of an asynchronous inference configuration (AsyncInferenceConfig) object in the endpoint configuration implies that the endpoint can only receive asynchronous invocations.

How Do I Get Started?

If you are a first-time user of Amazon SageMaker Asynchronous Inference, we recommend that you do the following:

Note that if your endpoint uses any of the features listed in this Exclusions page, you cannot use Asynchronous Inference.

PrivacySite termsCookie preferences
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved.