MXNet Elastic Inference with Python - Amazon Elastic Inference

MXNet Elastic Inference with Python

The Amazon Elastic Inference (Elastic Inference) enabled version of Apache MXNet lets you use Elastic Inference seamlessly, with few changes to your MXNet code. To use an existing MXNet inference script, make one change in the code. Wherever you set the context to bind your model, such as mx.cpu() or mx.gpu(), update this to use mx.eia() instead.

Elastic Inference Enabled Apache MXNet

For more information on MXNet set up, see Apache MXNet on AWS.

Preinstalled Elastic Inference Enabled MXNet

Elastic Inference enabled Apache MXNet is available in the AWS Deep Learning AMI.

Installing Elastic Inference Enabled MXNet

If you're not using a AWS Deep Learning AMI instance, a 'pip' package is available on Amazon S3 so you can build it in to your own Amazon Linux or Ubuntu AMIs using the following command:

pip install "latest-wheel"

Activate the MXNet Elastic Inference Environment

If you are using the AWS Deep Learning AMI, activate the Python 3 MXNet Elastic Inference environment or Python 2 MXNet Elastic Inference environment, depending on your version of Python.

For Python 3:

source activate amazonei_mxnet_p36

For Python 2:

source activate amazonei_mxnet_p27

If you are using a different AMI or a container, access the environment where MXNet is installed.

Validate MXNet for Elastic Inference Setup

Verify that you've properly set up your instance with Elastic Inference.

$ python ~/anaconda3/bin/EISetupValidator.py

If your instance is not properly set up with an accelerator, running any of the examples in this section will result in the following error:

Error: Failed to query accelerator metadata. Failed to detect any accelerator

For detailed instructions on how to launch an AWS Deep Learning AMI with an Elastic Inference accelerator, see the Elastic Inference documentation.

Check MXNet for Elastic Inference Version

You can verify that MXNet is available to use and check the current version with the following code from the Python terminal:

>>> import mxnet as mx >>> mx.__version__ '1.4.1'

This will return the version equivalent to the regular non-Elastic Inference version of MXNet available from GitHub

The commit hash number can be used to determine which release of the Elastic Inference-specific version of MXNet is installed using the following code:

import mxnet as mx import os path = os.path.join(mx.__path__[0],'COMMIT_HASH') print(open(path).read())

You can then compare the commit hash with the Release Notes to find the specific info about the version you have.  ​

Using Multiple Elastic Inference Accelerators with MXNet

You can run inference on MXNet when multiple Elastic Inference accelerators are attached to a single Amazon EC2 instance. The procedure for using multiple accelerators is the same as using multiple GPUs with MXNet.

Use the built-in EI Tool binary to get the device ordinal number of all attached Elastic Inference accelerators. For more information on EI Tool, see Monitoring Elastic Inference Accelerators.

/opt/amazon/ei/ei_tools/bin/ei describe-accelerators --json

Your output should look like the following:

{ "ei_client_version": "1.5.0", "time": "Fri Nov 1 03:09:38 2019", "attached_accelerators": 2, "devices": [ { "ordinal": 0, "type": "eia1.xlarge", "id": "eia-679e4c622d584803aed5b42ab6a97706", "status": "healthy" }, { "ordinal": 1, "type": "eia1.xlarge", "id": "eia-6c414c6ee37a4d93874afc00825c2f28", "status": "healthy" } ] }

Replace the device ordinal in the mx.eia(<device ordinal>) call with the device ordinal for your desired Elastic Inference accelerator as follows.

sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0) mod = mx.mod.Module(symbol=sym, context=mx.eia(<device ordinal>), label_names=None) mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes) mod.set_params(arg_params, aux_params, allow_missing=True) mod.forward(Batch([img]))

Use Elastic Inference with the MXNet Symbol API

Pass mx.eia() as the context in a call to either the simple_bind() or the bind() methods. For information, see MXNet Symbol API.

You use the mx.eia() context only with the bind call. The following example calls the simple_bind() method with the mx.eia() context:

import mxnet as mx data = mx.sym.var('data', shape=(1,)) sym = mx.sym.exp(data) # Pass mx.eia() as context during simple bind operation executor = sym.simple_bind(ctx=mx.eia(), grad_req='null') for i in range(10): # Forward call is performed on remote accelerator executor.forward(data=mx.nd.ones((1,))) print('Inference %d, output = %s' % (i, executor.outputs[0]))

The following example calls the bind() method:

import mxnet as mx a = mx.sym.Variable('a') b = mx.sym.Variable('b') c = 2 * a + b # Even for execution of inference workloads on eia, # context for input ndarrays to be mx.cpu() a_data = mx.nd.array([1,2], ctx=mx.cpu()) b_data = mx.nd.array([2,3], ctx=mx.cpu()) # Then in the bind call, use the mx.eia() context e = c.bind(mx.eia(), {'a': a_data, 'b': b_data}) # Forward call is performed on remote accelerator e.forward() print('1st Inference, output = %s' % (e.outputs[0])) # Subsequent calls can pass new data in a forward call e.forward(a=mx.nd.ones((2,)), b=mx.nd.ones((2,))) print('2nd Inference, output = %s' % (e.outputs[0]))

The following example calls the bind() method on a pre-trained real model (Resnet-50) from the Symbol API. Use your preferred text editor to create a script called mxnet_resnet50.py that has the following content. This script downloads the ResNet-50 model files (resnet-50-0000.params and resnet-50-symbol.json), list of labels(synset.txt) and an image of a cat. The cat image is used to get a prediction result from the pre-trained model. This result is looked up in the list of labels, returning a prediction result.

import mxnet as mx import numpy as np path='http://data.mxnet.io/models/imagenet/' [mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params'), mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json'), mx.test_utils.download(path+'synset.txt')] ctx = mx.eia() with open('synset.txt', 'r') as f: labels = [l.rstrip() for l in f] sym, args, aux = mx.model.load_checkpoint('resnet-50', 0) fname = mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true') img = mx.image.imread(fname) # convert into format (batch, RGB, width, height) img = mx.image.imresize(img, 224, 224) # resize img = img.transpose((2, 0, 1)) # Channel first img = img.expand_dims(axis=0) # batchify img = img.astype(dtype='float32') args['data'] = img softmax = mx.nd.random_normal(shape=(1,)) args['softmax_label'] = softmax exe = sym.bind(ctx=ctx, args=args, aux_states=aux, grad_req='null') exe.forward(data=img) prob = exe.outputs[0].asnumpy() # print the top-5 prob = np.squeeze(prob) a = np.argsort(prob)[::-1] for i in a[0:5]: print('probability=%f, class=%s' %(prob[i], labels[i]))

Then run the script, and you should see something similar to the following output. MXNet will optimize the model graph for Elastic Inference, load it on Elastic Inference accelerator, and then run inference against it:

(amazonei_mxnet_p36) ubuntu@ip-172-31-42-83:~$ python mxnet_resnet50.py [23:12:03] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade... [23:12:03] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded! Using Amazon Elastic Inference Client Library Version: 1.2.8 Number of Elastic Inference Accelerators Available: 1 Elastic Inference Accelerator ID: eia-95ae5a472b2241769656dbb5d344a80e Elastic Inference Accelerator Type: eia2.large probability=0.418679, class=n02119789 kit fox, Vulpes macrotis probability=0.293495, class=n02119022 red fox, Vulpes vulpes probability=0.029321, class=n02120505 grey fox, gray fox, Urocyon cinereoargenteus probability=0.026230, class=n02124075 Egyptian cat probability=0.022557, class=n02085620 Chihuahua

Use Elastic Inference with the MXNet Module API

When you create the Module object, pass mx.eia() as the context. For more information, see Module API.

To use the MXNet Module API, you can use the following commands:

# Load saved model sym, arg_params, aux_params = mx.model.load_checkpoint(model_path, EPOCH_NUM) # Pass mx.eia() as context while creating Module object mod = mx.mod.Module(symbol=sym, context=mx.eia()) # Only for_training = False is supported for eia mod.bind(for_training=False, data_shapes=data_shape) mod.set_params(arg_params, aux_params) # forward call is performed on remote accelerator mod.forward(data_batch)

The following script downloads two ResNet-152 model files (resnet-152-0000.params and resnet-152-symbol.json) and a labels list (synset.txt). It also downloads a cat image to get a prediction result from the pre-trained model, then looks this up in the result in labels list, returning a prediction result. Use your preferred text editor to create a script using the following content:

import mxnet as mx import numpy as np from collections import namedtuple Batch = namedtuple('Batch', ['data']) path='http://data.mxnet.io/models/imagenet/' [mx.test_utils.download(path+'resnet/152-layers/resnet-152-0000.params'), mx.test_utils.download(path+'resnet/152-layers/resnet-152-symbol.json'), mx.test_utils.download(path+'synset.txt')] ctx = mx.eia() sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0) mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None) mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes) mod.set_params(arg_params, aux_params, allow_missing=True) with open('synset.txt', 'r') as f: labels = [l.rstrip() for l in f] fname = mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true') img = mx.image.imread(fname) # convert into format (batch, RGB, width, height) img = mx.image.imresize(img, 224, 224) # resize img = img.transpose((2, 0, 1)) # Channel first img = img.expand_dims(axis=0) # batchify mod.forward(Batch([img])) prob = mod.get_outputs()[0].asnumpy() # print the top-5 prob = np.squeeze(prob) a = np.argsort(prob)[::-1] for i in a[0:5]: print('probability=%f, class=%s' %(prob[i], labels[i]))

Save this script as test.py

Use Elastic Inference with the MXNet Gluon API

The Gluon API in MXNet provides a clear, concise, and easy-to-use API for building and training machine learning models. For more information, see the Gluon Documentation.

To use the MXNet Gluon API model for inference-only tasks, you can use the following commands:

Note

Both the model parameters and input array must be allocated with the Elastic Inference context.

import mxnet as mx from mxnet.gluon import nn def create():     net = nn.HybridSequential()     net.add(nn.Dense(2))     return net # get a simple Gluon nn model net = create() net.initialize(ctx=mx.cpu()) # copy model parameters to EIA context net.collect_params().reset_ctx(mx.eia()) # hybridize the model with static alloc net.hybridize(static_alloc=True, static_shape=True) # allocate input array in EIA context and run inference x = mx.nd.random.uniform(-1,1,(3,4),ctx=mx.eia()) predictions = net(x) print(predictions)

You should be able to see the following output to confirm that you are using Elastic Inference:

Using Amazon Elastic Inference Client Library Version: xxxxxxxx Number of Elastic Inference Accelerators Available: 1 Elastic Inference Accelerator ID: eia-xxxxxxxxxxxxxxxxxxxxxxxx Elastic Inference Accelerator Type: xxxxxxxx

Loading parameters

There are a couple of different ways to load Gluon models. One way is to load model parameters from a file and specify the Elastic Inference context like the following:

# save the parameters to a file net.save_parameters('params.gluon') # create a new network using saved parameters net2 = create() net2.load_parameters('params.gluon', ctx=mx.eia()) net2.hybridize(static_alloc=True, static_shape=True) predictions = net2(x) print(predictions)

Loading Symbol and Parameters Files

You can also export the model’s symbol and parameters to a file, then import the model as shown in the following:

# export both symbol and parameters to a file net2.export('export') # create a new network using exported network net3 = nn.SymbolBlock.imports('export-symbol.json', ['data'],     'export-0000.params', ctx=mx.eia()) net3.hybridize(static_alloc=True, static_shape=True) predictions = net3(x)

If you have a model exported as symbol and parameter files, you can simply import those files and run inference.

import mxnet as mx import numpy as np from mxnet.gluon import nn ctx = mx.eia() path='http://data.mxnet.io/models/imagenet/' [mx.test_utils.download(path+'resnet/50-layers/resnet-50-0000.params'), mx.test_utils.download(path+'resnet/50-layers/resnet-50-symbol.json'), mx.test_utils.download(path+'synset.txt')] with open('synset.txt', 'r') as f:   labels = [l.rstrip() for l in f] fname = mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true') img = mx.image.imread(fname) # convert into format (batch, RGB, width, height) img = img.as_in_context(ctx) # image must be with EIA context img = mx.image.imresize(img, 224, 224) # resize img = img.transpose((2, 0, 1)) # channel first img = img.expand_dims(axis=0) # batchify img = img.astype(dtype='float32') # match data type resnet50 = nn.SymbolBlock.imports('resnet-50-symbol.json',['data','softmax_label'],     'resnet-50-0000.params',ctx=ctx) # import hybridized model symbols label = mx.nd.array([0], ctx=ctx) # dummy softmax label in EIA context resnet50.hybridize(static_alloc=True, static_shape=True) prob = resnet50(img, label) idx = prob.topk(k=5)[0] for i in idx:     i = int(i.asscalar())     print('With prob = %.5f, it contains %s' % (prob[0,i].asscalar(), labels[i]))    

Loading From Model Zoo

You can also use pre-trained models from Gluon model zoo as shown in the following:

Note

All pre-trained models expect inputs to be normalized in the same way as described in the model zoo documentation.

import mxnet as mx import numpy as np from mxnet.gluon.model_zoo import vision ctx = mx.eia() mx.test_utils.download('http://data.mxnet.io/models/imagenet/synset.txt') with open('synset.txt', 'r') as f:   labels = [l.rstrip() for l in f] fname = mx.test_utils.download('https://github.com/dmlc/web-data/blob/master/mxnet/doc/tutorials/python/predict_image/cat.jpg?raw=true') img = mx.image.imread(fname) # convert into format (batch, RGB, width, height) img = img.as_in_context(ctx) # image must be with EIA context img = mx.image.imresize(img, 224, 224) # resize img = mx.image.color_normalize(img.astype(dtype='float32')/255,                                mean=mx.nd.array([0.485, 0.456, 0.406]),                                std=mx.nd.array([0.229, 0.224, 0.225])) # normalize img = img.transpose((2, 0, 1)) # channel first img = img.expand_dims(axis=0) # batchify    resnet50 = vision.resnet50_v1(pretrained=True, ctx=ctx) # load model in EIA context resnet50.hybridize(static_alloc=True, static_shape=True) # hybridize prob = resnet50(img).softmax() # predict and normalize output idx = prob.topk(k=5)[0] # get top 5 result for i in idx:     i = int(i.asscalar())     print('With prob = %.5f, it contains %s' % (prob[0,i].asscalar(), labels[i]))

Troubleshooting

  • MXNet Elastic Inference is built with MKL-DNN, so all operations using mx.cpu() are supported and will run with the same performance as the standard release. MXNet Elastic Inference does not support mx.gpu(), so all operations using that context will throw an error. Sample error message:

    >>> mx.nd.ones((1),ctx=mx.gpu()) [20:35:32] src/imperative/./ imperative_utils.h:90: GPU support is disabled. Compile MXNet with USE_CUDA=1 to enable GPU support. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ubuntu/deps/MXNetECL /python/mxnet/ndarray/ndarray.py", line 2421, in ones return _internal._ones(shape=shape, ctx=ctx, dtype=dtype, **kwargs) File "<string>", line 34, in _ones File "/home/ubuntu/deps/MXNetECL /python/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke ctypes.byref(out_stypes))) File "/home/ubuntu/deps/MXNetECL /python/mxnet/base.py", line 252, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [20:35:32] src/imperative/imperative.cc:79: Operator _ones is not implemented for GPU.
  • Elastic Inference is only for production inference use cases and does not support any model training. When you use either the Symbol API or the Module API, do not call the backward() method or call bind() with for_training=True. This throws an error. Because the default value of for_training is True, make sure you set for_training=False manually in cases such as the example in Use Elastic Inference with the MXNet Module API. Sample error using test.py:

    Traceback (most recent call last): File "test.py", line 16, in <module> label_shapes=mod._label_shapes) File "/home/ec2-user/.local/lib/python3.6/site-packages/mxnet/module/module.py", line 402, in bind raise ValueError("for training cannot be set to true with EIA context") ValueError: for training cannot be set to true with EIA context
  • For Gluon, do not call training-specific functions or you will receive the following error:

    Traceback (most recent call last): File "train_gluon.py", line 44, in <module> output = net(data) File "/usr/local/lib/python2.7/dist-packages/mxnet/gluon/block.py", line 540, in __call__ out = self.forward(*args) File "train_gluon.py", line 24, in forward x = self.pool1(F.relu(self.conv1(x))) File "/usr/local/lib/python2.7/dist-packages/mxnet/gluon/block.py", line 540, in __call__ out = self.forward(*args) File "/usr/local/lib/python2.7/dist-packages/mxnet/gluon/block.py", line 909, in forward return self._call_cached_op(x, *args) File "/usr/local/lib/python2.7/dist-packages/mxnet/gluon/block.py", line 815, in _call_cached_op out = self._cached_op(*cargs) File "/usr/local/lib/python2.7/dist-packages/mxnet/_ctypes/ndarray.py", line 150, in __call__ ctypes.byref(out_stypes))) File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 252, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [23:10:21] /home/ubuntu/deps/MXNetECL/3rdparty/tvm/nnvm/include/nnvm/graph.h:230: Check failed: it != attrs.end() Cannot find attribute full_ref_count in the graph
  • Because training is not allowed, there is no point of initializing an optimizer for inference.

  • A model trained on an earlier version of MXNet will work on a later version of MXNet Elastic Inference because it is backwards compatible (e.g. train model on MXNet 1.3 and run on MXNet Elastic Inference 1.4). However, you may run into undefined behavior if you train on a later version of MXNet (e.g. train model on MXNet Master and run on MXNet EI 1.4)

  • Different sizes of Elastic Inference accelerators have different amounts of GPU memory. If your model requires more GPU memory than is available in your accelerator, you get a message that looks like the log below. If you run into this message, you should use a larger accelerator size with more memory. Stop and restart your instance with a larger accelerator.

    mxnet.base.MXNetError: [06:16:17] src/operator/subgraph/eia/eia_subgraph_op.cc:206: Last Error: EI Error Code: [51, 8, 31] EI Error Description: Accelerator out of memory. Consider using a larger accelerator. EI Request ID: MX-A19B0DE6-7999-4580-8C49-8EA 7ADSFFCB -- EI Accelerator ID: eia-cb0aasdfdfsdf2a acab7 EI Client Version: 1.2.12
  • For Gluon, remember that both the model and input array (image) must be allocated in the Elastic Inference context. If either the model parameters or an input are allocated in a different context, you will see one of the following errors:

    MXNetError: [21:59:27] src/imperative/cached_op.cc:866: Check failed: inputs[i]->ctx() == default_ctx (eia(0) vs. cpu(0)) CachedOp requires all inputs to live on the same context. But data is on cpu(0) while resnetv10_conv0_weight is on eia(0)
    RuntimeError: Parameter 'resnetv10_conv0_weight' was not initialized on context eia(0). It was only initialized on [cpu(0)].
  • For Gluon, make sure you hybridize the model and pass the static_alloc=True and static_shape=True options. Otherwise, MXNet will run inference in imperative mode on CPU and won’t invoke any Elastic Inference functionality. In this case, you won’t see Elastic Inference info messages, and may see MKLDNN info instead like the following:

    [21:40:20] src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 147456 bytes with malloc directly [21:40:20] src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 3211264 bytes with malloc directly [21:40:20] src/operator/nn/mkldnn/mkldnn_base.cc:74: Allocate 9437184 bytes with malloc directly
  • When you are using Symbol/Module API, you should always allocate arrays in the CPU context and bind with the Elastic Inference context. If you allocate arrays in the Elastic Inference context, you will see the following error when you try to bind the model:

    Traceback (most recent call last): File "symbol.py", line 43, in <module> exe = sym.bind(ctx=ctx, args=args, aux_states=aux, grad_req='null') File "/home/ubuntu/.local/lib/python2.7/site-packages/mxnet/symbol/symbol.py", line 1706, in bind ctypes.byref(handle))) File "/home/ubuntu/.local/lib/python2.7/site-packages/mxnet/base.py", line 252, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [00:05:25] src/executor/../common/exec_utils.h:516: Check failed: x == default_ctx Input array is in eia(0) while binding with ctx=cpu(0). All arguments must be in global context (cpu(0)) unless group2ctx is specified for cross-device graph.
  • Calling reshape explicitly by using either the Module or the Symbol API, or implicitly using different shapes for input NDArrays in different forward passes can lead to OOM errors. Before being reshaped, the model is not cleaned up on the accelerator until the session is destroyed. In Gluon, inferring with inputs of differing shapes will result in the model re-allocating memory. For Elastic Inference, this means that the model will be re-loaded on the accelerator leading to performance degradation and potential OOM errors. MXNet does not support the reshape operation for the EIA context. Using different input data sizes or batch sizes is not supported and may result in the following error. You can either pad your data so all shapes are the same or bind the model with different shapes to use multiple executors. The latter option may result in out-of-memory errors because the model is duplicated on the accelerator.

    mxnet.base.MXNetError: [17:06:11] src/operator/subgraph/eia/eia_subgraph_op.cc:224: Last Error: EI Error Code: [52, 3, 32] EI Error Description: Invalid tensor on accelerator EI Request ID: MX-96534015-D443-4EC2-B184-ABBBDB1B150E -- EI Accelerator ID: eia-a9957ab65c5f44de975944a641c86b03 EI Client Version: 1.3.1