Use Amazon SageMaker Elastic Inference (EI) - Amazon SageMaker

Use Amazon SageMaker Elastic Inference (EI)

By using Amazon Elastic Inference (EI), you can speed up the throughput and decrease the latency of getting real-time inferences from your deep learning models that are deployed as Amazon SageMaker hosted models, but at a fraction of the cost of using a GPU instance for your endpoint. EI allows you to add inference acceleration to a hosted endpoint for a fraction of the cost of using a full GPU instance. Add an EI accelerator in one of the available sizes to a deployable model in addition to a CPU instance type, and then add that model as a production variant to an endpoint configuration that you use to deploy a hosted endpoint. You can also add an EI accelerator to a Amazon SageMaker notebook instance so that you can test and evaluate inference performance when you are building your models.

Elastic Inference is supported in EI-enabled versions of TensorFlow, Apache MXNet, and PyTorch. To use any other deep learning framework, export your model by using ONNX, and then import your model into MXNet. You can then use your model with EI as an MXNet model. For information about importing an ONNX model into MXNet, see

How EI Works

Amazon Elastic Inference accelerators are network attached devices that work along with Amazon SageMaker instances in your endpoint to accelerate your inference calls. Elastic Inference accelerates inference by allowing you to attach fractional GPUs to any Amazon SageMaker instance. You can select the client instance to run your application and attach an Elastic Inference accelerator to use the right amount of GPU acceleration for your inference needs. Elastic Inference helps you lower your cost when not fully utilizing your GPU instance for inference. We recommend trying Elastic Inference with your model using different CPU instances and accelerator sizes.

The following EI accelerator types are available. You can configure your endpoints or notebook instances with any EI accelerator type.

In the table, the throughput in teraflops (TFLOPS) is listed for both single-precision floating-point (F32) and half-precision floating-point (F16) operations. The memory in GB is also listed.

Accelerator Type F32 Throughput in TFLOPS F16 Throughput in TFLOPS Memory in GB
ml.eia2.medium 1 8 2
ml.eia2.large 2 16 4
ml.eia2.xlarge 4 32 8
ml.eia1.medium 1 8 1
ml.eia1.large 2 16 2
ml.eia1.xlarge 4 32 4

Choose an EI Accelerator Type

Consider the following factors when choosing an accelerator type for a hosted model:

  • Models, input tensors and batch sizes influence the amount of accelerator memory you need. Start with an accelerator type that provides at least as much memory as the file size of your trained model. Factor in that a model might use significantly more memory than the file size at runtime.

  • Demands on CPU compute resources, main system memory, and GPU-based acceleration and accelerator memory vary significantly between different kinds of deep learning models. The latency and throughput requirements of the application also determine the amount of compute and acceleration you need. Thoroughly test different configurations of instance types and EI accelerator sizes to make sure you choose the configuration that best fits the performance needs of your application.

For more information on selecting an EI accelerator, see:

Use EI in a Amazon SageMaker Notebook Instance

Typically, you build and test machine learning models in a Amazon SageMaker notebook before you deploy them for production. You can attach EI to your notebook instance when you create the notebook instance. You can set up an endpoint that is hosted locally on the notebook instance by using the local mode supported by TensorFlow, MXNet, and PyTorch estimators and models in the Amazon SageMaker Python SDK to test inference performance. Elastic Inference enabled PyTorch is not currently supported on notebook instances. For instructions on how to attach EI to a notebook instance and set up a local endpoint for inference, see Attach EI to a Notebook Instance. There are also Elastic Inference-enabled Amazon SageMaker Notebook Jupyter kernels for Elastic Inference-enabled versions of TensorFlow and Apache MXNet. For information about using Amazon SageMaker notebook instances, see Use Amazon SageMaker Notebook Instances

Use EI on a Hosted Endpoint

When you are ready to deploy your model for production to provide inferences, you create a Amazon SageMaker hosted endpoint. You can attach EI to the instance where your endpoint is hosted to increase its performance at providing inferences. For instructions on how to attach EI to a hosted endpoint instance, see Use EI on Amazon SageMaker Hosted Endpoints.

Frameworks that Support EI

EI is designed to be used with AWS enhanced versions of TensorFlow, Apache MXNet, or PyTorch machine learning frameworks. These enhanced versions of the frameworks are automatically built into containers when you use the Amazon SageMaker Python SDK, or you can download them as binary files and import them in your own Docker containers. You can download the EI-enabled binary for TensorFlow from the Amazon S3 bucket at For information about building a container that uses the EI-enabled version of TensorFlow, see You can download the EI-enabled binary for Apache MXNet from the public Amazon S3 bucket at For information about building a container that uses the EI-enabled version of MXNet, see To download the Elastic Inference enabled PyTorch binary from the public Amazon S3 bucket, see For information about building a container that uses Elastic Inference enabled PyTorch, see

To use EI in a hosted endpoint, you can use any of the following, depending on your needs.

  • SageMaker Python SDK TensorFlow - if you want to use TensorFlow and you don't need to build a custom container.

  • SageMaker Python SDK MXNet - if you want to use MXNet and you don't need to build a custom container.

  • SageMaker Python SDK PyTorch - if you want to use PyTorch and you don't need to build a custom container.

  • The low-level AWS SDK for Python (Boto 3) - if you need to build a custom container.

Typically, you don't need to create a custom container unless your model is very complex and requires extensions to a framework that the Amazon SageMaker pre-built containers do not support.

Use EI with Amazon SageMaker Built-in Algorithms

Currently, the Image Classification Algorithm and Object Detection Algorithm built-in algorithms support EI. For an example that uses the Image Classification algorithm with EI, see

EI Sample Notebooks

The following Sample notebooks provide examples of using EI in Amazon SageMaker: