Using TensorFlow Elastic Inference accelerators on EC2 - Amazon Elastic Inference

Using TensorFlow Elastic Inference accelerators on EC2

When using Elastic Inference, you can use the same Amazon EC2 instance for models on two different frameworks. To do so, use the console to stop the Amazon EC2 instance and restart it, instead of rebooting it. The Elastic Inference accelerator doesn't detach when you reboot the instance.

To use the Elastic Inference accelerator with TensorFlow
  1. From the command line of your Amazon EC2 instance, pullĀ the TF-EI imageĀ from Amazon Elastic Container Registry (Amazon ECR) with the following code. To select an image, see Deep Learning Containers Images.

    docker pull 763104351884.dkr.ecr.<region>.amazonaws.com/tensorflow-inference-eia:<image_tag>
  2. Clone the GitHub Tensorflow repository for serving the half_plus_three model.

    https://github.com/tensorflow/serving.git
  3. Run the container with entry point for TF-half-plus-three. You can get the <image_id> by running the docker images command.

    docker run -p 8500:8500 -p 8501:8501 --name tensorflow-inference \ --mount type=bind,source=$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata/saved_model_half_plus_three,target=/models/saved_model_half_plus_three \ -e MODEL_NAME=saved_model_half_plus_three -d <image_id>
  4. Begin inference on the same instance using a query with the REST API.

    curl -d '{"instances": [1.0, 2.0, 5.0]}' -X POST http://127.0.0.1:8501/v1/models/saved_model_half_plus_three:predict
  5. Optionally, query from another Amazon EC2 instance. Make sure that the 8500 and 8501 ports are exposed in the security group.

    curl -d '{"instances": [1.0, 2.0, 5.0]}' -X POST http://<ec2_public_ip_address>:8501/v1/models/saved_model_half_plus_three:predict
  6. The results should look something like the following.

    { "predictions": [2.5, 3.0, 4.5 ] }