Apache MXNet on AWS
Developer Guide

Step 4: Train a Model

Now, train a simple model that recognizes handwritten digits using the MNIST dataset. First, create a Jupyter notebook, and then write Python code to train the model.

For more information about MNIST, see the MNIST documentation.

Step 4.1: Create a Jupyter Notebook

To write MXNet code for training a model, you use a Jupyter notebook.

Create the notebook

  1. Set up the Jupyter notebook. For instructions, see Set up a Jupyter Notebook. Follow the steps to configure both the server (the EC2 instance) and your client.

  2. Connect your client to the Jupyter server. For more information, see Step 4: Test by Logging in to the Jupyter Server.

    1. In a browser window, type the URL in the address bar.

      • For Windows clients, use the public DNS name of the EC2 instance followed by the port number, which is typically 8888.


        For example,

      • For macOS and Linux clients, use the localhost ( followed by the port number.

    2. If the connection is successful, the home page of the Jupyter notebook server appears. Type the password that you created when you configured the Jupyter server.

  3. Create a Jupyter notebook, choosing the Python 2 option.

    Now you are ready to write code.

Step 4.2: Train the Model

Use Python to incrementally add Apache MXNet code to the Jupyter notebook. In each step, run the code before going to the next step.

The code uses the MNIST dataset to train the model to recognize handwritten digits.

Train the model

  1. Download the MNIST dataset and prepare training and validation data by copying and pasting the following code into the Jupyter notebook and running it.

    import numpy as np import os import urllib import gzip import struct def download_data(url, force_download=True): ''' Download the file from the given url and return the file name. ''' fname = url.split("/")[-1] if force_download or not os.path.exists(fname): urllib.urlretrieve(url, fname) return fname def read_data(label_url, image_url): ''' Download labels (from the label_url) and images (from the image_url). Load them into Numpy arrays (label and images). Example: label[0] corresponds to label of the image at image[0]. ''' with as flbl: magic, num = struct.unpack(">II", label = np.fromstring(, dtype=np.int8) with, 'rb') as fimg: magic, num, rows, cols = struct.unpack(">IIII", image = np.fromstring(, dtype=np.uint8).reshape(len(label), rows, cols) return (label, image) ''' Download MNIST 10k dataset. This is a 28x28 hand-written digit images dataset with 60,000 samples. Download training data to train the model. Download validation data to test the model. ''' path='' (train_lbl, train_img) = read_data( path+'train-labels-idx1-ubyte.gz', path+'train-images-idx3-ubyte.gz') (val_lbl, val_img) = read_data( path+'t10k-labels-idx1-ubyte.gz', path+'t10k-images-idx3-ubyte.gz')
  2. Copy and paste the following code into the notebook and run it. The code plots the first 10 images in the training dataset.

    ''' Plot 10 images in the training dataset. ''' %matplotlib inline import matplotlib.pyplot as plt for i in range(10): plt.subplot(1,10,i+1) plt.imshow(train_img[i], cmap='Greys_r') plt.axis('off') print('label: %s' % (train_lbl[0:10],))

    You should see the following output:

  3. Prepare the MXNet data iterators for training and validating data by copying and pasting the following code into the notebook and running it. Subsequent code uses these data iterators when training the model on the dataset.

    ''' Create MXNet data iterators to iterate through the training and validation dataset. These iterators are used during both the training and testing. ''' import mxnet as mx def to4d(img): ''' Convert a batch of images into 4D matrix (batch_size, num_channels, width, height). In this example, MNIST dataset is a collection of 28X28 grey scale images. Grey scale images will have only one color channel. ''' return img.reshape(img.shape[0], 1, 28, 28).astype(np.float32)/255 ''' Prepare data iterators for training and validation. We use batches of 100 images for training and validation. We enable shuffling of data. This helps model to train faster. ''' batch_size = 100 train_iter =, train_lbl, batch_size, shuffle=True) val_iter =, val_lbl, batch_size)
  4. Define the structure of the neural network for model training by copying and pasting the following code into the notebook and running it. The code uses a fully connected multilayer perceptron network (MLP).

    ''' Define the neural network structure for training the model. We use a fully connected multi layer perceptron (MLP) network. ''' # Create a placeholder variable for the input data. data = mx.sym.Variable('data') # Flatten the data from 4-D shape (batch_size, num_channel, width, height) # into 2-D (batch_size, num_channel*width*height). data = mx.sym.Flatten(data=data) # The first fully-connected layer with relu activation function. fc1 = mx.sym.FullyConnected(data=data, name='fc1', num_hidden=128) act1 = mx.sym.Activation(data=fc1, name='relu1', act_type="relu") # The second fully-connected layer with relu activation function. fc2 = mx.sym.FullyConnected(data=act1, name='fc2', num_hidden = 64) act2 = mx.sym.Activation(data=fc2, name='relu2', act_type="relu") # The third fully-connected layer, note that the hidden size should be 10, # which is the number of unique digits (0-9). fc3 = mx.sym.FullyConnected(data=act2, name='fc3', num_hidden=10) # The softmax loss layer. mlp = mx.sym.SoftmaxOutput(data=fc3, name='softmax') # We visualize the network structure with output size (the batch_size is ignored). shape = {"data" : (batch_size, 1, 28, 28)} mx.viz.plot_network(symbol=mlp, shape=shape)
  5. Train the model by copying and pasting the following code into the notebook and then running it.

    import logging logging.getLogger().setLevel(logging.DEBUG) model = mx.mod.Module( symbol = mlp # network structure ) train_iter, # training data eval_data=val_iter, # validation data batch_end_callback = mx.callback.Speedometer(batch_size, 200), # output progress for each 200 data batches num_epoch = 10, # number of data passes for training optimizer = 'sgd', optimizer_params=(('learning_rate', 0.1),) )

Next Step

Step 5: Test the Model