How EI Works - Amazon SageMaker

How EI Works

Amazon Elastic Inference accelerators are network attached devices that work along with SageMaker instances in your endpoint to accelerate your inference calls. Elastic Inference accelerates inference by allowing you to attach fractional GPUs to any SageMaker instance. You can select the client instance to run your application and attach an Elastic Inference accelerator to use the right amount of GPU acceleration for your inference needs. Elastic Inference helps you lower your cost when not fully utilizing your GPU instance for inference. We recommend trying Elastic Inference with your model using different CPU instances and accelerator sizes.

The following EI accelerator types are available. You can configure your endpoints or notebook instances with any EI accelerator type.

In the table, the throughput in teraflops (TFLOPS) is listed for both single-precision floating-point (F32) and half-precision floating-point (F16) operations. The memory in GB is also listed.

Accelerator Type F32 Throughput in TFLOPS F16 Throughput in TFLOPS Memory in GB
ml.eia2.medium 1 8 2
ml.eia2.large 2 16 4
ml.eia2.xlarge 4 32 8
ml.eia1.medium 1 8 1
ml.eia1.large 2 16 2
ml.eia1.xlarge 4 32 4