PyTorch Elastic Inference with Python
The Amazon Elastic Inference enabled version of PyTorch lets you use Elastic Inference seamlessly, with few changes to your PyTorch code. The following tutorial shows how to perform inference using an Elastic Inference accelerator.
Elastic Inference enabled PyTorch is only available with Amazon Deep Learning Containers version 27 and later.
Topics
Install Elastic Inference Enabled PyTorch
Preinstalled Elastic Inference Enabled PyTorch
The Elastic Inference enabled packages are available in the AWS Deep Learning AMI
Installing Elastic Inference Enabled PyTorch
If you're not using a AWS Deep Learning AMI instance, you can download the packages
from the Amazon S3 bucket
Activate the PyTorch Elastic Inference Environment
If you are using the AWS Deep Learning AMI, activate the Python 3 Elastic Inference enabled PyTorch environment. Python 2 is not supported for Elastic Inference enabled PyTorch.
-
For PyTorch 1.3.1, run the following to activate the environment:
source activate amazonei_pytorch_p36
-
For PyTorch 1.5.1, run the following to activate the environment:
source activate amazonei_pytorch_latest_p36
-
For PyTorch 1.5.1 in Deep Learning AMI (Amazon Linux 2), run the following to activate the environment:
source activate amazonei_pytorch_latest_p37
If you are using a different AMI or a container, access the environment where PyTorch is installed.
The remaining parts of this guide assume you are using one of these PyTorch environment. If you are switching from MXNet or TensorFlow Elastic Inference environments, you must stop and then start your instance in order to reattach the Elastic Inference accelerator. Rebooting is not sufficient since the process of switching frameworks requires a complete shut down.
Use Elastic Inference with PyTorch for inference
With Elastic Inference enabled PyTorch, the inference API is largely unchanged. However,
you must use the with torch.jit.optimized_execution()
context to
trace or script your models into TorchScript, then perform inference. There are
also differences between the PyTorch 1.3.1 and 1.5.1 APIs that are demonstrated
in the following tutorial.
Run Inference with a ResNet-50 Model
To run inference using Elastic Inference enabled PyTorch, do the following.
-
Download a picture of a cat to your current directory.
curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
-
Download a list of ImageNet class mappings to your current directory.
wget https://aws-dlc-sample-models.s3.amazonaws.com/pytorch/imagenet_classes.txt
-
Use the built-in
EI Tool
to get the device ordinal number of all attached Elastic Inference accelerators. For more information onEI Tool
, see Monitoring Elastic Inference Accelerators.-
For PyTorch 1.3.1, run the following:
/opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json
-
For PyTorch 1.5.1, run the following:
~/anaconda3/envs/amazonei_pytorch_latest_p36/lib/python3.6/site-packages/torcheia/bin/ei describe-accelerators --json
-
For PyTorch 1.5.1 in Deep Learning AMI (Amazon Linux 2), run the following:
~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/torcheia/bin/ei describe-accelerators --json
Your output should look like the following:
{ "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy" }, { "ordinal": 1, "type": "eia1.xlarge", "id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy" } ] }
You use the device ordinal of your desired Elastic Inference accelerator to run inference.
-
-
Use your preferred text editor to create a script that has the following content. Name it
pytorch_resnet50_inference.py
. This script uses ImageNet pretrained TorchVision model weights for ResNet-50, a popular convolutional neural network for image classification. It traces the weights with an image tensor and saves it. The script then loads the saved model, performs inference on the input, and prints out the top predicted ImageNet classes. The implementation of the script differs between PyTorch 1.3.1 and 1.5.1.For PyTorch 1.3.1
This script uses the
torch.jit.optimized_execution
context, which is necessary to use the Elastic Inference accelerator. If you don't use thetorch.jit.optimized_execution
context correctly, then inference runs entirely on the client instance and doesn't use the attached accelerator. The Elastic Inference enabled PyTorch framework accepts two parameters for this context, while the vanilla PyTorch framework accepts only one parameter. The second parameter is used to specify the accelerator device ordinal.target_device
should be set to the device's ordinal number, not its ID. Ordinals are numbered beginning with 0.Note This script specifies the CPU device when loading the model. This avoids potential problems if the model was traced and saved using a GPU context.
import torch, torchvision import PIL from torchvision import transforms from PIL import Image def get_image(filename): im = Image.open(filename) # ImageNet pretrained models required input images to have width/height of 224 # and color channels normalized according to ImageNet distribution. im_process = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) im = im_process(im) # 3 x 224 x 224 return im.unsqueeze(0) # Add dimension to become 1 x 3 x 224 x 224 im = get_image('kitten.jpg') # eval() toggles inference mode model = torchvision.models.resnet50(pretrained=True).eval() # Compile model to TorchScript via tracing # Here want to use the first attached accelerator, so we specify ordinal 0. with torch.jit.optimized_execution(True, {'target_device': 'eia:0'}): # You can trace with any input model = torch.jit.trace(model, im) # Serialize model torch.jit.save(model, 'resnet50_traced.pt') # Deserialize model model = torch.jit.load('resnet50_traced.pt', map_location=torch.device('cpu')) # Perform inference. Make sure to disable autograd and use EI execution context with torch.no_grad(): with torch.jit.optimized_execution(True, {'target_device': 'eia:
device ordinal
'}): probs = model(im) # Torchvision implementation doesn't have Softmax as last layer. # Use Softmax to convert activations to range 0-1 (probabilities) probs = torch.nn.Softmax(dim=1)(probs) # Get top 5 predicted classes classes = eval(open('imagenet_classes.txt').read()) pred_probs, pred_indices = torch.topk(probs, 5) pred_probs = pred_probs.squeeze().numpy() pred_indices = pred_indices.squeeze().numpy() for i in range(len(pred_indices)): curr_class = classes[pred_indices[i]] curr_prob = pred_probs[i] print('{}: {:.4f}'.format(curr_class, curr_prob))For PyTorch 1.5.1
This script uses the
torch.jit.attach_eia
API to attach an accelerator device to a model. If you don't attach the device usingtorch.jit.attach_eia
correctly, then inference runs entirely on the client instance and doesn't use the attached accelerator. The Elastic Inference enabled PyTorch framework accepts two parameters for this context. The second parameter is used to specify the accelerator device ordinal.target_device
should be set to the device's ordinal number, not its ID. Ordinals are numbered beginning with 0.In the script,
torch.jit.attach_eia
uses PyTorch’s freeze module API, so the returned model has no attributes from the original model except for the forward
method.torch.jit.attach_eia
also allocates resources on the accelerator for every model it returns. This is why it should be called minimally. For example, avoid calling it infor
loops. There are only some rare circumstances where you might need to re-attach the Elastic Inference device. For example, if you change any attributes in the original model object, you will need to re-attach the Elastic Inference device usingtorch.jit.attach_eia
.Note This script specifies the CPU device when loading the model. This avoids potential problems if the model was traced and saved using a GPU context.
import torch, torcheia, torchvision import PIL from torchvision import transforms from PIL import Image def get_image(filename): im = Image.open(filename) # ImageNet pretrained models required input images to have width/height of 224 # and color channels normalized according to ImageNet distribution. im_process = transforms.Compose([transforms.Resize([224, 224]), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) im = im_process(im) # 3 x 224 x 224 return im.unsqueeze(0) # Add dimension to become 1 x 3 x 224 x 224 im = get_image('kitten.jpg') # eval() toggles inference mode model = torchvision.models.resnet50(pretrained=True).eval() # Convert model to torchscript model = torch.jit.script(model) # Serialize model torch.jit.save(model, 'resnet50_traced.pt') # Deserialize model model = torch.jit.load('resnet50_traced.pt', map_location=torch.device('cpu')) # Disable profiling executor. This is required for Elastic inference. torch._C._jit_set_profiling_executor(False) # Attach Accelerator using device ordinal eia_model = torcheia.jit.attach_eia(model, 0) # Perform inference. Make sure to disable autograd. with torch.no_grad(): with torch.jit.optimized_execution(True): probs = eia_model.forward(im) # Torchvision implementation doesn't have Softmax as last layer. # Use Softmax to convert activations to range 0-1 (probabilities) probs = torch.nn.Softmax(dim=1)(probs) # Get top 5 predicted classes classes = eval(open('imagenet_classes.txt').read()) pred_probs, pred_indices = torch.topk(probs, 5) pred_probs = pred_probs.squeeze().numpy() pred_indices = pred_indices.squeeze().numpy() for i in range(len(pred_indices)): curr_class = classes[pred_indices[i]] curr_prob = pred_probs[i] print('{}: {:.4f}'.format(curr_class, curr_prob))
Note For any PyTorch version, you don’t have to save and load your model. You can compile your model, then directly do inference with it. The benefit to saving your model is that it will save time for future inference jobs.
-
Run the inference script.
python pytorch_resnet50_inference.py
Your output should be similar to the following. The model predicts that the image is most likely to be a tabby cat, followed by a tiger cat.
Using Amazon Elastic Inference Client Library Version: 1.6.2 Number of Elastic Inference Accelerators Available: 1 Elastic Inference Accelerator ID: eia-53ab0670550948e88d7aac0bd331a583 Elastic Inference Accelerator Type: eia2.medium Elastic Inference Accelerator Ordinal: 0 tabby, tabby cat: 0.4674 tiger cat: 0.4526 Egyptian cat: 0.0667 plastic bag: 0.0025 lynx, catamount: 0.0007