TensorFlow 2 Elastic Inference with Python - Amazon Elastic Inference

TensorFlow 2 Elastic Inference with Python

With Elastic Inference TensorFlow 2 Serving, the standard TensorFlow 2 Serving interface remains unchanged. The only difference is that the entry point is a different binary named amazonei_tensorflow2_model_server.

TensorFlow 2 Serving and Predictor are the only inference modes that Elastic Inference supports. If you haven't tried TensorFlow 2 Serving before, we recommend that you try the TensorFlow Serving tutorial first.

This release of Elastic Inference TensorFlow Serving has been tested to perform well and provide cost-saving benefits with the following deep learning use cases and network architectures (and similar variants):

Use Case Example Network Topology

Image Recognition

Inception, ResNet, MVCNN

Object Detection

SSD, RCNN

Neural Machine Translation

GNMT

Note

These tutorials assume usage of a DLAMI with v42 or later, and Elastic Inference enabled Tensorflow 2.

Activate the Tensorflow 2 Elastic Inference Environment

Activate the Python 3 TensorFlow 2 Elastic Inference environment:

$ source activate amazonei_tensorflow2_p36

Use Elastic Inference with TensorFlow 2 Serving

The following is an example of serving a Single Shot Detector (SSD) with a ResNet backbone.

To serve and test inference with an inception model
  1. Download the model.

    curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip
  2. Unzip the model.

    unzip ssd_resnet.zip -d /tmp
  3. Download a picture of three dogs to your home directory.

    curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/3dogs.jpg
  4. Use the built-in EI Tool to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

    /opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json

    Your output should look like the following:

    { "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy" }, { "ordinal": 1, "type": "eia1.xlarge", "id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy" } ] }
  5. Navigate to the folder where AmazonEI_TensorFlow_Serving is installed and run the following command to launch the server. Set EI_VISIBLE_DEVICES to the device ordinal or device ID of the attached Elastic Inference accelerator that you want to use. This device will then be accessible using id 0. model_base_path must be an absolute path. For more information on EI_VISIBLE_DEVICES, see Monitoring Elastic Inference Accelerators.

    EI_VISIBLE_DEVICES=<ordinal number> amazonei_tensorflow2_model_server --model_name=ssdresnet --model_base_path=/tmp/ssd_resnet50_v1_coco --port=9000
  6. While the server is running in the foreground, launch another terminal session. Open a new terminal and activate the TensorFlow environment.

    source activate amazonei_tensorflow2_p36
  7. Use your preferred text editor to create a script that has the following content. Name it ssd_resnet_client.py. This script will take an image filename as a parameter and get a prediction result from the pretrained model.

    from __future__ import print_function import grpc import tensorflow as tf from PIL import Image import numpy as np import time import os from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2_grpc tf.compat.v1.app.flags.DEFINE_string('server', 'localhost:9000', 'PredictionService host:port') tf.compat.v1.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.compat.v1.app.flags.FLAGS coco_classes_txt = "https://raw.githubusercontent.com/amikelive/coco-labels/master/coco-labels-paper.txt" local_coco_classes_txt = "/tmp/coco-labels-paper.txt" # it's a file like object and works just like a file os.system("curl -o %s -O %s"%(local_coco_classes_txt, coco_classes_txt)) NUM_PREDICTIONS = 5 with open(local_coco_classes_txt) as f: classes = ["No Class"] + [line.strip() for line in f.readlines()] def main(_): channel = grpc.insecure_channel(FLAGS.server) stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) # Send request with Image.open(FLAGS.image) as f: f.load() # See prediction_service.proto for gRPC request/response details. data = np.asarray(f) data = np.expand_dims(data, axis=0) request = predict_pb2.PredictRequest() request.model_spec.name = 'ssdresnet' request.inputs['inputs'].CopyFrom( tf.make_tensor_proto(data, shape=data.shape)) result = stub.Predict(request, 60.0) # 10 secs timeout outputs = result.outputs detection_classes = outputs["detection_classes"] detection_classes = tf.make_ndarray(detection_classes) num_detections = int(tf.make_ndarray(outputs["num_detections"])[0]) print("%d detection[s]" % (num_detections)) class_label = [classes[int(x)] for x in detection_classes[0][:num_detections]] print("SSD Prediction is ", class_label) if __name__ == '__main__': tf.compat.v1.app.run()
  8. Now run the script passing the server location, port, and the dog photo's filename as the parameters.

    python ssd_resnet_client.py --server=localhost:9000 --image 3dogs.jpg

Use Elastic Inference with the TensorFlow 2 EIPredictor API

Elastic Inference TensorFlow packages for Python 3 provide an EIPredictor API. This API function provides you with a flexible way to run models on Elastic Inference accelerators as an alternative to using TensorFlow 2 Serving. The EIPredictor API provides a simple interface to perform repeated inference on a pretrained model. The following code sample shows the available parameters.

Note

accelerator_id should be set to the device's ordinal number, not its ID.

ei_predictor = EIPredictor(model_dir, signature_def_key=None, signature_def=None, input_names=None, output_names=None, tags=None, graph=None, config=None, use_ei=True, accelerator_id=<device ordinal number>) output_dict = ei_predictor(feed_dict)

You can use EIPredictor in the following ways:

//EIPredictor class picks inputs and outputs from default serving signature def with tag “serve”. (similar to TF predictor) ei_predictor = EIPredictor(model_dir) //EI Predictor class picks inputs and outputs from the signature def picked using the signtaure_def_key (similar to TF predictor) ei_predictor = EIPredictor(model_dir, signature_def_key='predict') // Signature_def can be provided directly (similar to TF predictor) ei_predictor = EIPredictor(model_dir, signature_def= sig_def) // You provide the input_names and output_names dict. // similar to TF predictor ei_predictor = EIPredictor(model_dir, input_names, output_names) // tag is used to get the correct signature def. (similar to TF predictor) ei_predictor = EIPredictor(model_dir, tags='serve')

Additional EI Predictor functionality includes the following:

  • Support for frozen models.

    // For Frozen graphs, model_dir takes a file name , input_names and output_names // input_names and output_names should be provided in this case. ei_predictor = EIPredictor(model_dir, input_names=None, output_names=None )
  • Ability to disable use of Elastic Inference by using the use_ei flag, which defaults to True. This is useful for testing EIPredictor against TensorFlow 2 Predictor.

  • EIPredictor can also be created from a TensorFlow 2 Estimator. Given a trained Estimator, you can first export a SavedModel. See the SavedModel documentation for more details. The following shows example usage:

    saved_model_dir = estimator.export_savedmodel(my_export_dir, serving_input_fn) ei_predictor = EIPredictor(export_dir=saved_model_dir) // Once the EIPredictor is created, inference is done using the following: output_dict = ei_predictor(feed_dict)

Use Elastic Inference with TensorFlow 2 Predictor Example

Installing Elastic Inference TensorFlow 2

Elastic Inference enabled TensorFlow 2 comes bundled in the AWS Deep Learning AMI. You can also download the pip wheels for Python 3 from the Elastic Inference S3 bucket. Follow these instructions to download and install the pip package:

  1. Choose the tar file for the Python version and operating system of your choice from the S3 bucket. Copy the path to the tar file and run the following command:

    curl -O [URL of the tar file of your choice]
  2. To open the tar the file, run the following command:

    tar -xvzf [name of tar file]
  3. Install the wheel using pip as shown in the following:

    pip install -U [name of untarred folder]/[name of tensorflow whl]

To serve different models, such as ResNet, using a Single Shot Detector (SSD), try the following example.

To serve and test inference with an SSD model
  1. Download and unzip the model. If you already have the model, skip this step.

    curl -O https://s3-us-west-2.amazonaws.com/aws-tf-serving-ei-example/ssd_resnet.zip unzip ssd_resnet.zip -d /tmp
  2. Download a picture of three dogs to your current directory.

    curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/3dogs.jpg
  3. Use the built-in EI Tool to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

    /opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json

    Your output should look like the following:

    { "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy" }, { "ordinal": 1, "type": "eia1.xlarge", "id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy" } ] }

    You use the device ordinal of your desired Elastic Inference accelerator to create a Predictor.

  4. Open a text editor, such as vim, and paste the following inference script. Replace the accelerator_id value with the device ordinal of the desired Elastic Inference accelerator. This value must be an integer. Save the file as ssd_resnet_predictor.py.

    from __future__ import absolute_import from __future__ import division from __future__ import print_function import os import sys import numpy as np import tensorflow as tf import matplotlib.image as mpimg from ei_for_tf.python.predictor.ei_predictor import EIPredictor tf.compat.v1.app.flags.DEFINE_string('image', '', 'path to image in JPEG format') FLAGS = tf.compat.v1.app.flags.FLAGS coco_classes_txt = "https://raw.githubusercontent.com/amikelive/coco-labels/master/coco-labels-paper.txt" local_coco_classes_txt = "/tmp/coco-labels-paper.txt" # it's a file like object and works just like a file os.system("curl -o %s -O %s"%(local_coco_classes_txt, coco_classes_txt)) NUM_PREDICTIONS = 5 with open(local_coco_classes_txt) as f: classes = ["No Class"] + [line.strip() for line in f.readlines()] def get_output(eia_predictor, test_input): pred = None for curpred in range(NUM_PREDICTIONS): pred = eia_predictor(test_input) num_detections = int(pred["num_detections"]) print("%d detection[s]" % (num_detections)) detection_classes = pred["detection_classes"][0][:num_detections] print([classes[int(i)] for i in detection_classes]) def main(_): img = mpimg.imread(FLAGS.image) img = np.expand_dims(img, axis=0) ssd_resnet_input = {'inputs': img} print('Running SSD Resnet on EIPredictor using specified input and outputs') eia_predictor = EIPredictor( model_dir='/tmp/ssd_resnet50_v1_coco/1/', input_names={"inputs": "image_tensor:0"}, output_names={"detection_classes": "detection_classes:0", "num_detections": "num_detections:0", "detection_boxes": "detection_boxes:0"}, accelerator_id=0 ) get_output(eia_predictor, ssd_resnet_input) print('Running SSD Resnet on EIPredictor using default Signature Def') eia_predictor = EIPredictor( model_dir='/tmp/ssd_resnet50_v1_coco/1/', ) get_output(eia_predictor, ssd_resnet_input) if __name__ == "__main__": tf.compat.v1.app.run()
  5. Run the inference script.

    python ssd_resnet_predictor.py --image 3dogs.jpg

For more tutorials and examples, see the TensorFlow Python API.

Use Elastic Inference with the TensorFlow 2 Keras API

The Keras API has become an integral part of the machine learning development cycle because of its simplicity and ease of use. Keras enables rapid prototyping and development of machine learning constructs. Elastic Inference provides an API that offers native support for Keras. Using this API, you can directly use your Keras model, h5 file, and weights to instantiate a Keras-like Object. This object supports the native Keras prediction APIs, while fully utilizing Elastic Inference in the backend. Currently, EIKerasModel is only supported in Graph Mode. The following code sample shows the available parameters:

EIKerasModel(model, weights=None, export_dir=None, ): """Constructs an `EIKerasModel` instance. Args: model: A model object that either has its weights already set, or will be set with the weights argument. A model file that can be loaded weights (Optional): A weights object, or weights file that can be loaded, and will be set to the model object export_dir: A folder location to save your model as a SavedModelBundle Raises: RuntimeError: If eager execution is enabled. """

EIKerasModel can be used as follows:

#Loading from Keras Model Object from ei_for_tf.python.keras.ei_keras import EIKerasModel model = Model() # Build Keras Model in the normal fashion x = # input data ei_model = EIKerasModel(model) # Only additional step to use EI res = ei_model.predict(x) #Loading from Keras h5 File from ei_for_tf.python.keras.ei_keras import EIKerasModel x = # input data ei_model = EIKerasModel("keras_model.h5") # Only additional step to use EI res = ei_model.predict(x) #Loading from Keras h5 File and Weights file from ei_for_tf.python.keras.ei_keras import EIKerasModel x = # input data ei_model = EIKerasModel("keras_model.json", weights="keras_weights.h5") # Only additional step to use EI res = ei_model.predict(x)

Additionally, Elastic Inference enabled Keras includes Predict API Support as follows:

tf.keras def predict( x, batch_size=None, verbose=0, steps=None, max_queue_size=10, #Not supported workers=1, #Not Supported use_multiprocessing=False): #Not Supported Native Keras def predict( x, batch_size=None, verbose=0, steps=None, callbacks=None) # Not supported

TensorFlow 2 Keras API Example

In this example, you use a trained ResNet-50 model to classify an image of an African Elephant from ImageNet.

To test inference with a Keras model
  1. Activate the Elastic Inference TensorFlow Conda Environment

    source activate amazonei_tensorflow2_p36
  2. Download an image of an African Elephant to your current directory.

    curl -O https://upload.wikimedia.org/wikipedia/commons/5/59/Serengeti_Elefantenbulle.jpg
  3. Open a text editor, such as vim, and paste the following inference script. Save the file as test_keras.py.

    # Resnet Example from tensorflow.keras.applications.resnet50 import ResNet50 from tensorflow.keras.preprocessing import image from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions from ei_for_tf.python.keras.ei_keras import EIKerasModel import numpy as np import time import os import tensorflow as tf tf.compat.v1.disable_eager_execution() ITERATIONS = 20 model = ResNet50(weights='imagenet') ei_model = EIKerasModel(model) folder_name = os.path.dirname(os.path.abspath(__file__)) img_path = folder_name + '/Serengeti_Elefantenbulle.jpg' img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x) # Warm up both models _ = model.predict(x) _ = ei_model.predict(x) # Benchmark both models for each in range(ITERATIONS): start = time.time() preds = model.predict(x) print("Vanilla iteration %d took %f" % (each, time.time() - start)) for each in range(ITERATIONS): start = time.time() ei_preds = ei_model.predict(x) print("EI iteration %d took %f" % (each, time.time() - start)) # decode the results into a list of tuples (class, description, probability) # (one such list for each sample in the batch) print('Predicted:', decode_predictions(preds, top=3)[0]) print('EI Predicted:', decode_predictions(ei_preds, top=3)[0])
  4. Run the inference script as follows:

    python test_keras.py
  5. Your output should be a list of predictions and their respective confidence score.

    ('Predicted:', [(u'n02504458', u'African_elephant', 0.9081173), (u'n01871265', u'tusker', 0.07836755), (u'n02504013', u'Indian_elephant', 0.011482777)]) ('EI Predicted:', [(u'n02504458', u'African_elephant', 0.90811676), (u'n01871265', u'tusker', 0.07836751), (u'n02504013', u'Indian_elephant', 0.011482781)])

For more tutorials and examples, see the TensorFlow Python API.

Use Elastic Inference with SageMaker Neo compiled models

Amazon Elastic Inference supports TensorFlow models optimized by SageMaker Neo for TensorFlow versions 2.3 or greater. A pre-trained TensorFlow model can be compiled in SageMaker Neo with EIA as the target device. The resulting model artifacts can be used for inference in Elastic Inference Accelerators.

Compilation for EIA target device uses TF-TRT (TensorFlow with TensorRT) and provides a performance boost by optimizing the model to produce low latency inferences. This increases inference throughput and reduces costs. See Nvidia’s TF-TRT user guide more information.

You can compile your TensorFlow model with the AWS CLI, the Amazon SageMaker console, or the Amazon SageMaker SDK. In each case, select ml_eia2 as your target device. See Use Neo to Compile a Model for detailed information on how to compile your model.

The same TensorFlow Serving and Predictor interfaces that are supported by Elastic Inference can be used for compiled models as well. See Use Elastic Inference with TensorFlow 2 Serving, Use Elastic Inference with the TensorFlow 2 EIPredictor API, or Use Elastic Inference with TensorFlow 2 Predictor Example for more information.

TensorFlow models can be compiled with two different precision modes: FP32 and FP16. SageMaker Neo uses FP32 for EIA compilations by default. Compared to FP32, FP16 can improve the model's inference performance without sacrificing much accuracy. Models compiled with FP16 precision usually provide accuracy within 0.1% of the accuracy of same models compiled with FP32 precision. FP16 precision is not be preferred if the model's weights or inputs contain values exceed plus or minus 65504.

Compared to their uncompiled versions, models compiled for EIA usually require larger host memory at runtime (inference time). This might lead to Out Of Memory (OOM) issues in Elastic Inference Accelerators, particularly on smaller accelerators such as eia2.medium. If this occurs, upgrade the accelerator to a larger size or use the uncompiled model instead.