PyTorch Elastic Inference with Python - Amazon Elastic Inference

PyTorch Elastic Inference with Python

The Amazon Elastic Inference enabled version of PyTorch lets you use Elastic Inference seamlessly, with few changes to your PyTorch code. The following tutorial shows how to perform inference using an Elastic Inference accelerator.


Elastic Inference enabled PyTorch is only available with Amazon Deep Learning Containers version 27 and later.

Install Elastic Inference Enabled PyTorch

Preinstalled Elastic Inference Enabled PyTorch

The Elastic Inference enabled packages are available in the AWS Deep Learning AMI. You also have Docker container options through the Amazon Deep Learning Containers.

Installing Elastic Inference Enabled PyTorch

If you're not using a AWS Deep Learning AMI instance, you can download the packages from the Amazon S3 bucketto build it into your own Amazon Linux or Ubuntu AMIs.

Activate the PyTorch Elastic Inference Environment

If you are using the AWS Deep Learning AMI, activate the Python 3 Elastic Inference enabled PyTorch environment. Python 2 is not supported for Elastic Inference enabled PyTorch.

For Python 3, run the following to activate the environment:

source activate amazonei_pytorch_p36

If you are using a different AMI or a container, access the environment where PyTorch is installed.

The remaining parts of this guide assume you are using the amazonei_pytorch_p36 environment. If you are switching from MXNet or TensorFlow Elastic Inference environments, you must stop and then start your instance in order to reattach the Elastic Inference accelerator. Rebooting is not sufficient since the process of switching frameworks requires a complete shut down.

Use Elastic Inference with PyTorch for inference

With Elastic Inference enabled PyTorch, the inference API is largely unchanged. However, you must use the with torch.jit.optimized_execution() context to trace or script your models into TorchScript, then perform inference.

Run Inference with a ResNet-50 Model

To run inference using Elastic Inference enabled PyTorch, do the following.

  1. Download a picture of a cat to your current directory.

    curl -O
  2. Download a list of ImageNet class mappings to your current directory.

  3. Use the built-in EI Tool to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

    /opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json

    Your output should look like the following:

    { "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy" }, { "ordinal": 1, "type": "eia1.xlarge", "id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy" } ] }

    You use the device ordinal of your desired Elastic Inference accelerator to run inference.

  4. Use your preferred text editor to create a script that has the following content. Name it This script uses ImageNet pretrained TorchVision model weights for ResNet-50, a popular convolutional neural network for image classification. It traces the weights with an image tensor and saves it. The script then loads the saved model, performs inference on the input, and prints out the top predicted ImageNet classes.

    This script uses the torch.jit.optimized_execution context, which is necessary to use the Elastic Inference accelerator. If you don't use the torch.jit.optimized_execution context correctly, then inference will run entirely on the client instance and won't use the attached accelerator. The Elastic Inference enabled PyTorch framework accepts two parameters for this context, while the vanilla PyTorch framework accepts only one parameter. The second parameter is used to specify the accelerator device ordinal. target_device should be set to the device's ordinal number, not its ID. Ordinals are numbered beginning with 0.


    This script specifies the CPU device when loading the model. This avoids potential problems if the model was traced and saved using a GPU context.

    import torch, torchvision import PIL from torchvision import transforms from PIL import Image def get_image(filename): im = # ImageNet pretrained models required input images to have width/height of 224 # and color channels normalized according to ImageNet distribution. im_process = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) im = im_process(im) # 3 x 224 x 224 return im.unsqueeze(0) # Add dimension to become 1 x 3 x 224 x 224 im = get_image('kitten.jpg') # eval() toggles inference mode model = torchvision.models.resnet50(pretrained=True).eval() # Compile model to TorchScript via tracing # Here want to use the first attached accelerator, so we specify ordinal 0. with torch.jit.optimized_execution(True, {'target_device': 'eia:0'}): # You can trace with any input model = torch.jit.trace(model, im) # Serialize model, '') # Deserialize model model = torch.jit.load('', map_location=torch.device('cpu')) # Perform inference. Make sure to disable autograd and use EI execution context with torch.no_grad(): with torch.jit.optimized_execution(True, {'target_device': 'eia:device ordinal'}): probs = model(im) # Torchvision implementation doesn't have Softmax as last layer. # Use Softmax to convert activations to range 0-1 (probabilities) probs = torch.nn.Softmax(dim=1)(probs) # Get top 5 predicted classes classes = eval(open('imagenet_classes.txt').read()) pred_probs, pred_indices = torch.topk(probs, 5) pred_probs = pred_probs.squeeze().numpy() pred_indices = pred_indices.squeeze().numpy() for i in range(len(pred_indices)): curr_class = classes[pred_indices[i]] curr_prob = pred_probs[i] print('{}: {:.4f}'.format(curr_class, curr_prob))

    You don’t have to save and load your model. You can compile your model, then directly do inference with it. The benefit to saving your model is that it will save time for future inference jobs.

  5. Run the inference script.


    Your output should be similar to the following. The model predicts that the image is most likely to be a tabby cat, followed by a tiger cat.

    Using Amazon Elastic Inference Client Library Version: 1.6.2 Number of Elastic Inference Accelerators Available: 1 Elastic Inference Accelerator ID: eia-53ab0670550948e88d7aac0bd331a583 Elastic Inference Accelerator Type: eia2.medium Elastic Inference Accelerator Ordinal: 0 tabby, tabby cat: 0.4674 tiger cat: 0.4526 Egyptian cat: 0.0667 plastic bag: 0.0025 lynx, catamount: 0.0007