MXNet Elastic Inference with Deep Java Library (DJL)
The Amazon Elastic Inference (EI) accelerator library lets you use EI seamlessly, with few changes to your
Apache MXNet (incubating) code. The Deep Java Library (DJL) supports EI through partitioning and
optimizing it for the EIA
backend.
Deep Java Library
Note
This topic covers using Elastic Inference enabled MXNet version 1.7.0 and later. For information about using Elastic Inference enabled MXNet 1.5.1 and earlier, see MXNet Elastic Inference 1.5.1 with Java .
Environment Setup
Set up Elastic Inference and DJL with the MXNet engine by completing the following steps.
Setup for Elastic Inference
DJL requires JDK8 or above to be installed on the machine. To support EI from MXNet,
download the EI binary and set the MXNET_EXTRA_LIBRARY_PATH
environment
variable with the path to your EI binary. For example, run the following commands to get the
required EI library:
curl -o ei.whl https://amazonei-apachemxnet.s3.amazonaws.com/eimx-1.0-py2.py3-none-manylinux1_x86_64.whl unzip ei.whl export MXNET_EXTRA_LIBRARY_PATH=$(pwd)/eimx/libeimx.so
Setup for DJL with MXNet engine
When setting up DJL, there are no special instructions. Add DJL and the DJL dependencies for MXNet as usual. Here is a sample of the Gradle dependencies:
dependencies { implementation "ai.djl:api:0.10.0" implementation "ai.djl.mxnet:mxnet-engine:0.10.0" runtimeOnly "ai.djl.mxnet:mxnet-native-auto:1.7.0-backport" }
As shown in the previous step, import the DJL API, MXNet engine, and MXNet native packages. Read
the DJL MXNet
documentation
Note
Note: EI is supported only for DJL 0.10.0 with MXNet 1.7.0 for the 1.0 version of eimx package.
Using DJL with MXNet on EI
For information about DJL, see the DJL quick start
DJL supports EI only for models that were built and exported from Apache MXNet. Models trained in DJL or the other DJL engines are not currently supported.
DJL can load models using the ModelZoo.loadModel(criteria)
method.
loadModel
accepts a single argument, criteria
, which describes the
model that you are trying to load, where it is located, what pre-processing and
post-processing to use, and other model loading options. While it is often used for searching
and filtering the built-in DJL model zoo, it can also be used to load custom models from
various sources including local files, http web locations, within a jar in your classpath, and
from a bucket in Amazon S3. For more information, see DJL model loading
documentation
In general, all you need to do to support EI on MXNet inference using DJL is to add the
following option to your criteria
.optOption("MxOptimizeFor", "EIA") // Use EI Acceleration
Example
To show how the inference process works, the following is a Gradle setup for a simple image classification example. Provide a template project that can be run using the following command:
curl -O https://djl-ai.s3.amazonaws.com/resources/demo/eia/eia.zip unzip eia.zip cd eia ./gradlew run
Inside of the package, you will find a README.md that contains the instructions to run the project. Now let’s take a look at the key components in this package.
build.gradle
The following code loads the DJL API package and MXNet dependencies.
plugins { id 'application' } group = 'ai.djl.examples' version = '0.0.1-SNAPSHOT' repositories { jcenter() } application { mainClassName = System.getProperty("main", "ai.djl.examples.Example") } dependencies { implementation "ai.djl:api:0.10.0" implementation "ai.djl.mxnet:mxnet-model-zoo:0.10.0" runtimeOnly "ai.djl.mxnet:mxnet-native-auto:1.7.0-backport" }
Example.java
The following is a part of the Example.java file. It shows the core steps to load the model and run inference.
String modelUrl = "https://alpha-djl-demos.s3.amazonaws.com/model/djl-blockrunner/mxnet_resnet18.zip?model_name=resnet18_v1"; // Build criteria to load the model Criteria<Image, Classifications> criteria = Criteria.builder() .setTypes(Image.class, Classifications.class) .optModelUrls(modelUrl) .optOption("MxOptimizeFor", "EIA") // Use EI Acceleration .optTranslator(ImageClassificationTranslator.builder() .addTransform(new Resize(224, 224)) .addTransform(new ToTensor()) .optApplySoftmax(true).build()) .build(); // Run inference with DJL try (ZooModel<Image, Classifications> model = ModelZoo.loadModel(criteria); Predictor<Image, Classifications> predictor = model.newPredictor()) { // load image String imageURL = "https://raw.githubusercontent.com/awslabs/djl/master/examples/src/test/resources/kitten.jpg"; Image image = ImageFactory.getInstance().fromUrl(imageURL); // Run inference with DJL System.out.println(predictor.predict(image)); }
The sample output log is like the following:
src/eia_lib.cc:264 MXNet version 10700 supported [22:36:31] src/c_api/c_api.cc:354: Found 1 operators in library [22:36:31] src/c_api/c_api.cc:419: Op[0] _eia_subgraph_op [22:36:31] src/c_api/c_api.cc:420: isSubgraphOp [22:36:31] src/c_api/c_api.cc:988: Found 1 partitioners in library [22:36:31] src/c_api/c_api.cc:1004: Partitioner[0] EIA [22:36:31] src/c_api/c_api.cc:1026: Strategy[0] strategy1 subgraphOp: '_eia_subgraph_op' [22:36:31] src/c_api/c_api.cc:1049: Found 0 graph passes in library [22:36:31] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.5.0. Attempting to upgrade... [22:36:31] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded! Using Amazon Elastic Inference Client Library Version: 1.8.0 Number of Elastic Inference Accelerators Available: 1 Elastic Inference Accelerator ID: eia-dd4389f3d32043da924e2cc90076d58d Elastic Inference Accelerator Type: eia1.large Elastic Inference Accelerator Ordinal: 0 [ class: "n02123045 tabby, tabby cat", probability: 0.41073 class: "n02124075 Egyptian cat", probability: 0.29393 class: "n02123159 tiger cat", probability: 0.19337 class: "n02123394 Persian cat", probability: 0.04586 class: "n02127052 lynx, catamount", probability: 0.00911 ]
Troubleshooting
The following are issues that you might run into and possible solutions.
-
If you see the error
Deep Learning Engine not Found
, it’s most likely because of one of the following reasons:-
Unsatisfied Link error - DJL requires Amazon Linux 2, Ubuntu 16.04, and the above versions to run the MXNet project. This issue is typically caused by a mismatch in the System and package versions.
-
No write access to the cache folder - DJL defaults to caching content in the
$HOME/.djl.ai
folder. You might receive this error if you don’t have write access to this location. You can override theDJL_CACHE_DIR
environment variable to set an alternative cache directory. For information, see Resource Cachesin the DJL documentation.
-
-
If you see either of the following error messages:
src/c_api/c_api_symbolic.cc:1498: Error optimizing for backend 'EIA' cannot be found
Exception in thread "main" ai.djl.engine.EngineException: No deep learning engine found. ... Caused by: ai.djl.engine.EngineException: Failed to load MXNet native library ... Caused by: java.io.FileNotFoundException: Extra Library not found: /home/ubuntu/eimx/eimx/libeimx.so
This means that the
MXNET_EXTRA_LIBRARY_PATH
environment variable is not set, it points to a file other thanlibeimx.so
library, or it points to a file that does not exist. -
If your inference speed does not improve:
Check if you have something in your log similar to the following:
Number of Elastic Inference Accelerators Available: 1 Elastic Inference Accelerator ID: eia-######################### Elastic Inference Accelerator Type: eiaX.YYYYYY Elastic Inference Accelerator Ordinal: 0
EI accelerated inference should always print this piece of information to specify the backend you are using. There should be no additional error thrown in the inference process.
-
For all other issues, refer to the DJL Trouble shooting
page.