Deep Learning AMI
Developer Guide

Use Elastic Inference with TensorFlow

Elastic Inference (EI) is available only on instances the were launched with an Elastic Inference Accelerator.

Review Selecting the Instance Type for DLAMI to select your desired instance type, and also review Elastic Inference Prerequisites for the instructions related to Elastic Inference. For detailed instructions on how to launch a DLAMI with an Elastic Inference Accelerator, see the Elastic Inference documentation.

TensorFlow Serving is the only inference mode that EI supports. Predictor and Estimator are not supported. If you haven't tried TensorFlow Serving before, we recommend that you try the TensorFlow Serving tutorial first.

Activate the TensorFlow Elastic Inference Environment

  1. First, connect to your Deep Learning AMI with Conda and activate the Python 2.7 TensorFlow environment. The example scripts are not compatible with Python 3.x.

    $ source activate amazonei_tensorflow_p27
  2. The remaining parts of this guide assume you are using the amazonei_tensorflow_p27 environment.


If you are switching between MXNet or TensorFlow Elastic Inference environments, you must Stop and then Start your instance to reattach the Elastic Inference Accelerator. Rebooting is not sufficient since the process requires a complete shutdown.

Using Elastic Inference with TensorFlow

The EI-enabled version of TensorFlow lets you use EI seamlessly with few changes to your TensorFlow code. You can use EI with TensorFlow in one of the following ways:

Using EI TensorFlow Serving

Elastic Inference TensorFlow Serving uses the same API as normal TensorFlow Serving. The only difference is that the entry point is a different binary named AmazonEI_TensorFlow_Serving_v1.11_v1. This file is found at /usr/local/bin/AmazonEI_TensorFlow_Serving_v1.11_v1. The following example shows commands to use TensorFlow serving.

EI TensorFlow Serving Examples

The following is an example you can try for serving different models like Inception. As a general rule, you need a servable model and client scripts to be already downloaded to your DLAMI.

Serve and Test Inference with an Inception Model

  1. Download the model to your home directory.

    $ curl -O
  2. Untar the model.

    $ unzip
  3. Download a picture of a husky to your home directory.

    $ curl -O
  4. Launch the server. Note, "model_base_path" must be an absolute path.

    For Ubuntu, use:

    $ AmazonEI_TensorFlow_Serving_v1.11_v1 --model_name=inception --model_base_path=/home/ubuntu/inception_example --port=9000

    For Amazon Linux, use:

    $ AmazonEI_TensorFlow_Serving_v1.11_v1 --model_name=inception --model_base_path=/home/ec2-user/inception_example --port=9000
  5. With the server running in the foreground you will need to launch another terminal session to continue. Open a new terminal and activate TensorFlow with source activate amazonei_tensorflow_p27. Then use your preferred text editor to create a script that has the following content. Name it This script will take an image filename as a parameter, and get a prediction result from the pre-trained model.

    from __future__ import print_function import grpc import tensorflow as tf from PIL import Image import numpy as np import time from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc'server', 'localhost:9000', 'PredictionService host:port')'image', '', 'path to image in JPEG format') FLAGS = def main(_): channel = grpc.insecure_channel(FLAGS.server) stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) # Send request with as f: f.load() # See prediction_service.proto for gRPC request/response details. data = np.asarray(f, dtype = "float") data = np.resize(data, (224, 224, 3)) data = np.expand_dims(data, axis=0) request = predict_pb2.PredictRequest() = 'inception' request.inputs['Placeholder:0'].CopyFrom( tf.contrib.util.make_tensor_proto(data, shape=[1,224,224,3], dtype=tf.float32)) start = time.time() result = stub.Predict(request, 60.0) # 10 secs timeout stop = time.time() print("Inception prediction took %fs"%(stop - start)) print("Inception Client Passed") if __name__ == '__main__':
  6. Now run the script passing the server location and port and the husky photo's filename as the parameters.

    $ python --server=localhost:9000 --image Siberian_Husky_bi-eyed_Flickr.jpg

Considerations When Using EI-enabled TensorFlow

  • Warmup: Tensorflow Serving provides a warmup feature to pre-load models and reduce the delay that is typical of the first inference request. Amazon EI TensorFlow Serving only supports warming up the “serving_default” signature definition.

  • Signature Definitions: Using multiple signature definitions can have a multiplicative effect on the amount of accelerator memory consumed. If you plan to exercise more than one signature definition for your inference calls, you should test these scenarios as you determine the accelerator type for your application.

More Models and Resources

  1. TensorFlow Serving - TensorFlow Serving has inference examples that can be used with EI.

For more tutorials and examples, see the framework's official Python documentation, the TensorFlow Python API.