Using Amazon Neptune ML for machine learning on graphs
There is often valuable information in large connected datasets that can be hard to extract using queries based on human intuition alone. Machine learning (ML) techniques can help find hidden correlations in graphs with billions of relationships. These correlations can be helpful for recommending products, predicting credit worthiness, identifying fraud, and many other things.
The Neptune ML feature makes it possible to build and train useful machine learning
models on large graphs in hours instead of weeks. To accomplish this, Neptune ML
uses graph neural network (GNN) technology powered by Amazon SageMaker
This feature is available starting in Neptune engine release 1.0.4.1.
Graph neural networks (GNNs) are an emerging field in artificial intelligence (see,
for example, A Comprehensive Survey on Graph
Neural Networks
What Neptune ML can do
Neptune ML can train machine learning models to support three different categories of inference:
Graph vertices are identified in Neptune ML models as "nodes". For example, vertex classification uses a node-classification machine learning model, and vertex regression uses a node-regression model.
Types of inference task currently supported by Neptune ML
-
Node classification – This task involves predicting the categorical feature of a vertex property.
For example, given the movie The Shawshank Redemption, Neptune ML can predict its
genre
property asstory
from a candidate set of[story, crime, action, fantasy, drama, family, ...]
.There are two types of node-classification tasks:
-
Single-class classification: In this kind of task, each node has only one target feature. For example, the property,
Place_of_birth
ofAlan Turing
has the valueUK
. -
Multi-class classification: In this kind of task, each node can have more than one target feature. For example, the property
genre
of the film The Godfather has the valuescrime
andstory
.
-
-
Node regression – This task involves predicting a numerical property of a vertex.
For example, given the movie Avengers: Endgame, Neptune ML can predict that its property
popularity
has a value of5.0
. -
Link prediction – This task involves predicting the most likely destination nodes for a particular source node and outgoing edge, or the most likely source nodes for a given destination node and incoming edge.
For example, with a Drug-Disease knowledge graph, given
Aspirin
as the source node, andtreats
as the outgoing edge, Neptune ML can predict the most relevant destination nodes asheart disease
,fever
, and so on.Or, with the Wikimedia knowledge graph, given
President-of
as the edge or relation andUnited-States
as the destination node, Neptune ML can predict the most relevant heads asGeorge Washington
,Abraham Lincoln
,Franklin D. Roosevelt
, and so on.
With Neptune ML, you can use machine learning models that fall in two general categories:
Types of machine learning model currently supported by Neptune ML
-
Graph Neural Network (GNN) models – These include Relational Graph Convolutional Networks (R-GCNs)
. GNN models work for all three types of task above. -
Knowledge-Graph Embedding (KGE) models – These include
TransE
,DistMult
, andRotatE
models. They only work for link prediction.
Using the Neptune ML AWS CloudFormation template to get started quickly
The easiest way to get started with Neptune ML is to use the AWS CloudFormation quick-start template. This template installs all necessary components including a Neptune DB cluster, enables Neptune ML in lab mode, and sets up the necessary IAM roles.
To create the Neptune ML quick-start stack
-
To launch the AWS CloudFormation stack on the AWS CloudFormation console, choose one of the Launch Stack buttons in the following table:
Region View View in Designer Launch US East (N. Virginia) View View in Designer US East (Ohio) View View in Designer US West (N. California) View View in Designer US West (Oregon) View View in Designer Canada (Central) View View in Designer South America (São Paulo) View View in Designer Europe (Stockholm) View View in Designer Europe (Ireland) View View in Designer Europe (London) View View in Designer Europe (Paris) View View in Designer Europe (Frankfurt) View View in Designer Middle East (Bahrain) View View in Designer Asia Pacific (Hong Kong) View View in Designer Asia Pacific (Tokyo) View View in Designer Asia Pacific (Seoul) View View in Designer Asia Pacific (Singapore) View View in Designer Asia Pacific (Sydney) View View in Designer Asia Pacific (Mumbai) View View in Designer -
On the Select Template page, choose Next.
-
On the Specify Details page, choose Next.
-
On the Options page, choose Next.
-
On the Review page, select the first check box to acknowledge that AWS CloudFormation will create IAM resources. Select the second check box to acknowledge
CAPABILITY_AUTO_EXPAND
for the new stack.Note CAPABILITY_AUTO_EXPAND
explicitly acknowledges that macros will be expanded when creating the stack, without prior review. Users often create a change set from a processed template so that the changes made by macros can be reviewed before actually creating the stack. For more information, see the AWS CloudFormation CreateStack API.Then choose Create.
The quick-start template creates and sets up the following:
-
A Neptune DB cluster.
-
The necessary IAM roles (and attaches them).
-
The necessary Amazon EC2 security group.
-
The necessary SageMaker VPC endpoints.
-
A DB cluster parameter group for Neptune ML.
-
The necessary parameters in that parameter group.
-
A SageMaker notebook with pre-populated notebook samples for Neptune ML.
-
The Neptune-Export service.
When the quick-start stack is ready, go to the SageMaker notebook that the template created and check out the pre-populated examples. They will help you download sample datasets to use for experimenting with Neptune ML capabilities.
They can also save you a lot of time when you are using Neptune ML. For example, see the %neptune_ml line magic, and the %%neptune_ml cel magic that these notebooks support.
You can also use the following AWS CLI command to run the quick-start AWS CloudFormation template:
aws cloudformation create-stack \ --stack-name neptune-ml-fullstack-$(date '+%Y-%m-%d-%H-%M') \ --template-url https://s3.amazonaws.com/aws-neptune-customer-samples/v2/cloudformation-templates/neptune-ml-nested-stack.json \ --parameters ParameterKey=EnableIAMAuthOnExportAPI,ParameterValue=
(true if you have IAM auth enabled, or false otherwise)
\ ParameterKey=Env,ParameterValue=test$(date '+%H%M')\ --capabilities CAPABILITY_IAM \ --region(the AWS region, like us-east-1)
\ --disable-rollback \ --profile(optionally, a named CLI profile of yours)