What Is Amazon Elastic Inference? - Amazon Elastic Inference

What Is Amazon Elastic Inference?

Amazon Elastic Inference (Elastic Inference) is a resource you can attach to your Amazon Elastic Compute Cloud CPU instances, Amazon Deep Learning Containers, and SageMaker instances. Elastic Inference helps you accelerate your deep learning (DL) inference workloads. Elastic Inference accelerators come in multiple sizes and help you build intelligent capabilities into your applications.

Elastic Inference distributes model operations defined by TensorFlow, Apache MXNet (MXNet), and PyTorch between low-cost, DL inference accelerators and the CPU of the instance. Elastic Inference also supports the open neural network exchange (ONNX) format through MXNet.


You need an Amazon Web Services account and should be familiar with launching an Amazon EC2, Amazon Deep Learning Containers, or SageMaker instances to successfully run Amazon Elastic Inference. To launch an Amazon EC2 instance, complete the steps in Setting up with Amazon EC2. Amazon S3 resources are required for installing packages via pip. For more information about setting up Amazon S3 resources, see the Amazon Simple Storage Service User Guide.

Pricing for Amazon Elastic Inference

You are charged for each second that an Elastic Inference accelerator is attached to an instance in the running state. You are not charged for an accelerator attached to an instance that is in the pending, stopping, stopped, shutting-down, or terminated state. You are also not charged when an Elastic Inference accelerator is in the unknown or impaired state.

You do not incur AWS PrivateLink charges for VPC endpoints to the Elastic Inference service when you have accelerators provisioned in the subnet.

For more information about pricing by Region for Elastic Inference, see Elastic Inference Pricing.

Elastic Inference Uses

You can use Elastic Inference in the following use cases:

Next Up

Amazon Elastic Inference Basics