Amazon Elastic Inference Basics - Amazon Elastic Inference

Amazon Elastic Inference Basics

When you configure an Amazon EC2 instance to launch with an Elastic Inference accelerator, AWS finds available accelerator capacity. It then establishes a network connection between your instance and the accelerator.

The following Elastic Inference accelerator types are available. You can attach any Elastic Inference accelerator type to any Amazon EC2 instance type.

Accelerator Type FP32 Throughput (TFLOPS) FP16 Throughput (TFLOPS) Memory (GB)
eia2.medium 1 8 2
eia2.large 2 16 4
eia2.xlarge 4 32 8

You can attach multiple Elastic Inference accelerators of various sizes to a single Amazon EC2 instance when launching the instance. With multiple accelerators, you can run inference for multiple models on a single fleet of Amazon EC2 instances. If your models require different amounts of GPU memory and compute capacity, you can choose the appropriate accelerator size to attach to your CPU. For faster response times, load your models to an Elastic Inference accelerator once and continue making inference calls on multiple accelerators without unloading any models for each call. By attaching multiple accelerators to a single instance, you avoid deploying multiple fleets of CPU or GPU instances and the associated cost. For more information on attaching multiple accelerators to a single instance, see Using TensorFlow Models with Elastic Inference, Using MXNet Models with Elastic Inference , and Using PyTorch Models with Elastic Inference.


Attaching multiple Elastic Inference accelerators to a single Amazon EC2 instance requires that the instance has AWS Deep Learning AMI (DLAMI) version 25 or later. For more information on the AWS Deep Learning AMI, see What Is the AWS Deep Learning AMI?.

An Elastic Inference accelerator is not part of the hardware that makes up your instance. Instead, the accelerator is attached through the network using an AWS PrivateLink endpoint service. The endpoint service routes traffic from your instance to the Elastic Inference accelerator configured with your instance.


An Elastic Inference accelerator cannot be modified through the management console of your instance.

Before you launch an instance with an Elastic Inference accelerator, you must create an AWS PrivateLink endpoint service. Only a single endpoint service is needed in every Availability Zone to connect instances with Elastic Inference accelerators. A single endpoint service can span multiple Availability Zones. For more information, see VPC Endpoint Services (AWS PrivateLink).

An Elastic Inference accelerator attached to an Amazon EC2 instance.

You can use Amazon Elastic Inference enabled TensorFlow, TensorFlow Serving, Apache MXNet, or PyTorch libraries to load models and make inference calls. The modified versions of these frameworks automatically detect the presence of Elastic Inference accelerators. They then optimally distribute the model operations between the Elastic Inference accelerator and the CPU of the instance. The AWS Deep Learning AMIs include the latest releases of Amazon Elastic Inference enabled TensorFlow, TensorFlow Serving, MXNet, and PyTorch. If you are using custom AMIs or container images, you can download and install the required TensorFlow, Apache MXNet, and PyTorch libraries from Amazon S3.

Elastic Inference Uses

You can use Elastic Inference in the following use cases: