Overview of how to use the Neptune ML feature - Amazon Neptune

Overview of how to use the Neptune ML feature

Starting workflow for using Neptune ML

Using the Neptune ML feature in Amazon Neptune generally involves the following five steps to begin with:

Neptune ML workflow diagram
  1. Data export and configuration   –   The data-export step uses the Neptune-Export service or the neptune-export command line tool to export data from Neptune into Amazon Simple Storage Service (Amazon S3) in CSV form. A configuration file named training-data-configuration.json is automatically generated at the same time, which specifies how the exported data can be loaded into a trainable graph.

  2. Data preprocessing   –   In this step, the exported dataset is preprocessed using standard techniques to prepare it for model training. Feature normalization can be performed for numeric data, and text features can be encoded using word2vec. At the end of this step, a DGL (Deep Graph library) graph is generated from the exported dataset for the model training step to use.

    This step is implemented using a SageMaker processing job in your account, and the resulting data is stored in an Amazon S3 location that you have specified.

  3. Model training   –   The model training step trains the machine learning model that will be used for predictions.

    Model training is done in two stages:

    • The first stage uses a SageMaker processing job to generate a model training strategy configuration set that specifies what type of model and model hyperparameter ranges will be used for the model training.

    • The second stage then uses a SageMaker model tuning job to try different hyperparameter configurations and select the training job that produced the best-performing model. The tuning job runs a pre-specified number of model training job trials on the processed data. At the end of this stage, the trained model parameters of the best training job are used to generate model artifacts for inference.

  4. Create an inference endpoint in Amazon SageMaker   –   The inference endpoint is a SageMaker endpoint instance that is launched with the model artifacts produced by the best training job. Each model is tied to a single endpoint. The endpoint is able to accept incoming requests from the graph database and return the model predictions for inputs in the requests. After you have created the endpoint, it stays active until you delete it.

  5. Query the machine learning model using Gremlin   –   You can use extensions to the Gremlin query language to query predictions from the inference endpoint.

Note

The Neptune workbench contains a line magic and a cell magic that can save you a lot of time managing these steps, namely:

Workflows for handling evolving graph data

With a continuously changing graph, you may need to update ML predictions frequently using the newest data. While you can do this simply by re-running the steps one through four (from Data export and configuration to Create an inference endpoint in Amazon SageMaker, Neptune ML supports simpler ways to update your ML predictions using new data. One is an incremental-model workflow:

Incremental-model inference workflow

In this workflow, you update the ML predictions without retraining the ML model.

Note

You can only do this when the graph data has been updated with new nodes and/or edges. It will not currently work when nodes are removed.

  1. Data export and configuration   –     This step is the same as in the main workflow.

  2. Incremental data preprocessing   –     This step is similar to the data preprocessing step in the main workflow, but uses the same processing configuration used previously, that corresponds to a specific trained model.

  3. Model transform   –     Instead of a model training step, this model-transform step takes the trained model from the main workflow and the results of the incremental data preprocessing step, and generates new model artifacts to use for inference. The model-transform step launches a SageMaker processing job to perform the computation that generates the updated model artifacts.

  4. Update the Amazon SageMaker inference endpoint   –     This step updates an existing inference endpoint with the new model artifacts generated by the model-transform step. Alternatively, you can also create a new inference endpoint with the new model artifacts.

Model re-training with a warm start

Using this workflow, you can train and deploy a new ML model for making predictions using the incremental graph data, but start from an existing model generated using the main workflow:

  1. Data export and configuration   –     This step is the same as in the main workflow.

  2. Incremental data preprocessing   –     This step is the same as in the incremental model inference workflow. The new graph data should be processed with the same processing method that was used previously for model training.

  3. Model training with a warm start   –     Model training is similar to what happens in the main workflow, but you can speed up model hyperparameter search by leveraging the information from the previous model training task.

  4. Update the Amazon SageMaker inference endpoint   –     This step is the same as in the incremental model inference workflow.