PyTorch Elastic Inference with Python - Amazon Elastic Inference

PyTorch Elastic Inference with Python

The Amazon Elastic Inference enabled version of PyTorch lets you use Elastic Inference seamlessly, with few changes to your PyTorch code. The following tutorial shows how to perform inference using an Elastic Inference accelerator.

Note

Elastic Inference enabled PyTorch is only available with Amazon Deep Learning Containers version 27 and later.

Install Elastic Inference Enabled PyTorch

Preinstalled Elastic Inference Enabled PyTorch

The Elastic Inference enabled packages are available in the AWS Deep Learning AMI. You also have Docker container options through the Amazon Deep Learning Containers.

Installing Elastic Inference Enabled PyTorch

If you're not using a AWS Deep Learning AMI instance, you can download the packages from the Amazon S3 bucket to build it into your own Amazon Linux or Ubuntu AMIs.

Activate the PyTorch Elastic Inference Environment

If you are using the AWS Deep Learning AMI, activate the Python 3 Elastic Inference enabled PyTorch environment. Python 2 is not supported for Elastic Inference enabled PyTorch.

  • For PyTorch 1.3.1, run the following to activate the environment:

    source activate amazonei_pytorch_p36
  • For PyTorch 1.5.1, run the following to activate the environment:

    source activate amazonei_pytorch_latest_p36
  • For PyTorch 1.5.1 in Deep Learning AMI (Amazon Linux 2), run the following to activate the environment:

    source activate amazonei_pytorch_latest_p37

If you are using a different AMI or a container, access the environment where PyTorch is installed.

The remaining parts of this guide assume you are using one of these PyTorch environment. If you are switching from MXNet or TensorFlow Elastic Inference environments, you must stop and then start your instance in order to reattach the Elastic Inference accelerator. Rebooting is not sufficient since the process of switching frameworks requires a complete shut down.

Use Elastic Inference with PyTorch for inference

With Elastic Inference enabled PyTorch, the inference API is largely unchanged. However, you must use the with torch.jit.optimized_execution() context to trace or script your models into TorchScript, then perform inference. There are also differences between the PyTorch 1.3.1 and 1.5.1 APIs that are demonstrated in the following tutorial.

Run Inference with a ResNet-50 Model

To run inference using Elastic Inference enabled PyTorch, do the following.

  1. Download a picture of a cat to your current directory.

    curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
  2. Download a list of ImageNet class mappings to your current directory.

    wget https://aws-dlc-sample-models.s3.amazonaws.com/pytorch/imagenet_classes.txt
  3. Use the built-in EI Tool to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

    • For PyTorch 1.3.1, run the following:

      /opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json
    • For PyTorch 1.5.1, run the following:

      ~/anaconda3/envs/amazonei_pytorch_latest_p36/lib/python3.6/site-packages/torcheia/bin/ei describe-accelerators --json
    • For PyTorch 1.5.1 in Deep Learning AMI (Amazon Linux 2), run the following:

      ~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torcheia/bin/ei describe-accelerators --json

    Your output should look like the following:

    { "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy" }, { "ordinal": 1, "type": "eia1.xlarge", "id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy" } ] }

    You use the device ordinal of your desired Elastic Inference accelerator to run inference.

  4. Use your preferred text editor to create a script that has the following content. Name it pytorch_resnet50_inference.py. This script uses ImageNet pretrained TorchVision model weights for ResNet-50, a popular convolutional neural network for image classification. It traces the weights with an image tensor and saves it. The script then loads the saved model, performs inference on the input, and prints out the top predicted ImageNet classes. The implementation of the script differs between PyTorch 1.3.1 and 1.5.1.

    For PyTorch 1.3.1

    This script uses the torch.jit.optimized_execution context, which is necessary to use the Elastic Inference accelerator. If you don't use the torch.jit.optimized_execution context correctly, then inference runs entirely on the client instance and doesn't use the attached accelerator. The Elastic Inference enabled PyTorch framework accepts two parameters for this context, while the vanilla PyTorch framework accepts only one parameter. The second parameter is used to specify the accelerator device ordinal. target_device should be set to the device's ordinal number, not its ID. Ordinals are numbered beginning with 0.

    Note

    This script specifies the CPU device when loading the model. This avoids potential problems if the model was traced and saved using a GPU context.

    import torch, torchvision import PIL from torchvision import transforms from PIL import Image def get_image(filename): im = Image.open(filename) # ImageNet pretrained models required input images to have width/height of 224 # and color channels normalized according to ImageNet distribution. im_process = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) im = im_process(im) # 3 x 224 x 224 return im.unsqueeze(0) # Add dimension to become 1 x 3 x 224 x 224 im = get_image('kitten.jpg') # eval() toggles inference mode model = torchvision.models.resnet50(pretrained=True).eval() # Compile model to TorchScript via tracing # Here want to use the first attached accelerator, so we specify ordinal 0. with torch.jit.optimized_execution(True, {'target_device': 'eia:0'}): # You can trace with any input model = torch.jit.trace(model, im) # Serialize model torch.jit.save(model, 'resnet50_traced.pt') # Deserialize model model = torch.jit.load('resnet50_traced.pt', map_location=torch.device('cpu')) # Perform inference. Make sure to disable autograd and use EI execution context with torch.no_grad(): with torch.jit.optimized_execution(True, {'target_device': 'eia:device ordinal'}): probs = model(im) # Torchvision implementation doesn't have Softmax as last layer. # Use Softmax to convert activations to range 0-1 (probabilities) probs = torch.nn.Softmax(dim=1)(probs) # Get top 5 predicted classes classes = eval(open('imagenet_classes.txt').read()) pred_probs, pred_indices = torch.topk(probs, 5) pred_probs = pred_probs.squeeze().numpy() pred_indices = pred_indices.squeeze().numpy() for i in range(len(pred_indices)): curr_class = classes[pred_indices[i]] curr_prob = pred_probs[i] print('{}: {:.4f}'.format(curr_class, curr_prob))

    For PyTorch 1.5.1

    This script uses the torch.jit.attach_eia API to attach an accelerator device to a model. If you don't attach the device using torch.jit.attach_eia correctly, then inference runs entirely on the client instance and doesn't use the attached accelerator. The Elastic Inference enabled PyTorch framework accepts two parameters for this context. The second parameter is used to specify the accelerator device ordinal. target_device should be set to the device's ordinal number, not its ID. Ordinals are numbered beginning with 0.

    In the script, torch.jit.attach_eia uses PyTorch’s freeze module API, so the returned model has no attributes from the original model except for the forward method. torch.jit.attach_eia also allocates resources on the accelerator for every model it returns. This is why it should be called minimally. For example, avoid calling it in for loops. There are only some rare circumstances where you might need to re-attach the Elastic Inference device. For example, if you change any attributes in the original model object, you will need to re-attach the Elastic Inference device using torch.jit.attach_eia.

    Note

    This script specifies the CPU device when loading the model. This avoids potential problems if the model was traced and saved using a GPU context.

    import torch, torcheia, torchvision import PIL from torchvision import transforms from PIL import Image def get_image(filename): im = Image.open(filename) # ImageNet pretrained models required input images to have width/height of 224 # and color channels normalized according to ImageNet distribution. im_process = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) im = im_process(im) # 3 x 224 x 224 return im.unsqueeze(0) # Add dimension to become 1 x 3 x 224 x 224 im = get_image('kitten.jpg') # eval() toggles inference mode model = torchvision.models.resnet50(pretrained=True).eval() # Convert model to torchscript model = torch.jit.script(model) # Serialize model torch.jit.save(model, 'resnet50_traced.pt') # Deserialize model model = torch.jit.load('resnet50_traced.pt', map_location=torch.device('cpu')) # Disable profiling executor. This is required for Elastic inference. torch._C._jit_set_profiling_executor(False) # Attach Accelerator using device ordinal eia_model = torcheia.jit.attach_eia(model, 0) # Perform inference. Make sure to disable autograd. with torch.no_grad(): with torch.jit.optimized_execution(True): probs = eia_model.forward(im) # Torchvision implementation doesn't have Softmax as last layer. # Use Softmax to convert activations to range 0-1 (probabilities) probs = torch.nn.Softmax(dim=1)(probs) # Get top 5 predicted classes classes = eval(open('imagenet_classes.txt').read()) pred_probs, pred_indices = torch.topk(probs, 5) pred_probs = pred_probs.squeeze().numpy() pred_indices = pred_indices.squeeze().numpy() for i in range(len(pred_indices)): curr_class = classes[pred_indices[i]] curr_prob = pred_probs[i] print('{}: {:.4f}'.format(curr_class, curr_prob))
    Note

    For any PyTorch version, you don’t have to save and load your model. You can compile your model, then directly do inference with it. The benefit to saving your model is that it will save time for future inference jobs.

  5. Run the inference script.

    python pytorch_resnet50_inference.py

    Your output should be similar to the following. The model predicts that the image is most likely to be a tabby cat, followed by a tiger cat.

    Using Amazon Elastic Inference Client Library Version: 1.6.2 Number of Elastic Inference Accelerators Available: 1 Elastic Inference Accelerator ID: eia-53ab0670550948e88d7aac0bd331a583 Elastic Inference Accelerator Type: eia2.medium Elastic Inference Accelerator Ordinal: 0 tabby, tabby cat: 0.4674 tiger cat: 0.4526 Egyptian cat: 0.0667 plastic bag: 0.0025 lynx, catamount: 0.0007