Amazon Elastic Inference Basics
When you configure an Amazon EC2 instance to launch with an Elastic Inference accelerator, AWS finds available accelerator capacity. It then establishes a network connection between your instance and the accelerator.
The following Elastic Inference accelerator types are available. You can attach any Elastic Inference accelerator type to any Amazon EC2 instance type.
Accelerator Type | FP32 Throughput (TFLOPS) | FP16 Throughput (TFLOPS) | Memory (GB) |
---|---|---|---|
eia2.medium | 1 | 8 | 2 |
eia2.large | 2 | 16 | 4 |
eia2.xlarge | 4 | 32 | 8 |
You can attach multiple Elastic Inference accelerators of various sizes to a single Amazon EC2 instance when launching the instance. With multiple accelerators, you can run inference for multiple models on a single fleet of Amazon EC2 instances. If your models require different amounts of GPU memory and compute capacity, you can choose the appropriate accelerator size to attach to your CPU. For faster response times, load your models to an Elastic Inference accelerator once and continue making inference calls on multiple accelerators without unloading any models for each call. By attaching multiple accelerators to a single instance, you avoid deploying multiple fleets of CPU or GPU instances and the associated cost. For more information on attaching multiple accelerators to a single instance, see Using TensorFlow Models with Elastic Inference, Using MXNet Models with Elastic Inference , and Using PyTorch Models with Elastic Inference.
Note
Attaching multiple Elastic Inference accelerators to a single Amazon EC2 instance requires that the instance has AWS Deep Learning AMIs (DLAMI) version 25 or later. For more information on the AWS Deep Learning AMIs, see What Is the AWS Deep Learning AMI?.
An Elastic Inference accelerator is not part of the hardware that makes up your instance. Instead, the accelerator is attached through the network using an AWS PrivateLink endpoint service. The endpoint service routes traffic from your instance to the Elastic Inference accelerator configured with your instance.
Note
An Elastic Inference accelerator cannot be modified through the management console of your instance.
Before you launch an instance with an Elastic Inference accelerator, you must create an AWS PrivateLink endpoint service. Only a single endpoint service is needed in every Availability Zone to connect instances with Elastic Inference accelerators. A single endpoint service can span multiple Availability Zones. For more information, see VPC Endpoint Services (AWS PrivateLink).
You can use Amazon Elastic Inference enabled TensorFlow, TensorFlow Serving, Apache MXNet, or PyTorch
libraries to load models and make inference calls. The modified versions of these frameworks
automatically detect the presence of Elastic Inference accelerators. They then optimally distribute the model
operations between the Elastic Inference accelerator and the CPU of the instance. The AWS Deep Learning AMIs
Elastic Inference Uses
You can use Elastic Inference in the following use cases:
-
For Elastic Inference-enabled TensorFlow and TensorFlow 2 with Python, see Using TensorFlow Models with Elastic Inference
-
For Elastic Inference-enabled MXNet with Python, Java, and Scala, see Using MXNet Models with Elastic Inference
-
For Elastic Inference-enabled PyTorch with Python, see Using PyTorch Models with Elastic Inference
-
For Elastic Inference with SageMaker, see MXNet Elastic Inference with SageMaker
For Amazon Deep Learning Containers with Elastic Inference on Amazon EC2, Amazon ECS, and SageMaker, see Using Amazon Deep Learning Containers With Elastic Inference
-
For security information on Elastic Inference, see Security in Amazon Elastic Inference
-
To troubleshoot your Elastic Inference workflow, see Troubleshooting