Amazon Elastic Inference
Developer Guide

What Is Amazon Elastic Inference?

Amazon Elastic Inference (EI) is a resource you can attach to your Amazon EC2 CPU instances to accelerate your deep learning (DL) inference workloads. Elastic Inference accelerators come in multiple sizes and are a cost-effective method to build intelligent capabilities into applications running on Amazon EC2 instances.

Elastic Inference distributes model operations defined by TensorFlow and Apache MXNet between low-cost, DL inference accelerators and the CPU of the instance. Elastic Inference also supports the Open Neural Network Exchange (ONNX) format through MXNet.


You will need an Amazon Web Services account and should be familiar with launching an EC2 instance to successfully run Amazon Elastic Inference. To launch an EC2 instance, complete the steps in Setting up with Amazon EC2. Amazon S3 resources are required for installing packages via pip. For more information about setting up Amazon S3 resources, see the Amazon Simple Storage Service Getting Started Guide.

Pricing for Amazon Elastic Inference

You are charged for each second that an Elastic Inference accelerator is attached to an instance in the running state. You are not charged for an accelerator attached to an instance that is in the pending, stopping, stopped, shutting-down, or terminated state. You are also not charged when an Elastic Inference accelerator is in the unknown or impaired state.

You do not incur AWS PrivateLink charges for VPC endpoints to the Elastic Inference service when you have accelerators provisioned in the subnet.

For more information about pricing by region for Elastic Inference, see Elastic Inference Pricing.

Next Up

Amazon Elastic Inference Basics