Using Amazon Neptune ML for machine learning on graphs - Amazon Neptune

Using Amazon Neptune ML for machine learning on graphs

There is often valuable information in large connected datasets that can be hard to extract using queries based on human intuition alone. Machine learning (ML) techniques can help find hidden correlations in graphs with billions of relationships. These correlations can be helpful for recommending products, predicting credit worthiness, identifying fraud, and many other things.

The Neptune ML feature makes it possible to build and train useful machine learning models on large graphs in hours instead of weeks. To accomplish this, Neptune ML uses graph neural network (GNN) technology powered by Amazon SageMaker and the Deep Graph Library (DGL) (which is open-source).

Note

This feature is available starting in Neptune engine release 1.0.4.1.

Graph neural networks (GNNs) are an emerging field in artificial intelligence (see, for example, A Comprehensive Survey on Graph Neural Networks). For a hands-on tutorial about using GNNs with DGL, see Learning graph neural networks with Deep Graph Library.

What Neptune ML can do

Neptune ML can train machine learning models to support three different categories of inference:

Note

Graph vertices are identified in Neptune ML models as "nodes". For example, vertex classification uses a node-classification machine learning model, and vertex regression uses a node-regression model.

Types of inference task currently supported by Neptune ML

  • Node classification   –   This task involves predicting the categorical feature of a vertex property.

    For example, given the movie The Shawshank Redemption, Neptune ML can predict its genre property as story from a candidate set of [story, crime, action, fantasy, drama, family, ...].

    There are two types of node-classification tasks:

    • Single-class classification: In this kind of task, each node has only one target feature. For example, the property, Place_of_birth of Alan Turing has the value UK.

    • Multi-class classification: In this kind of task, each node can have more than one target feature. For example, the property genre of the film The Godfather has the values crime and story.

  • Node regression   –   This task involves predicting a numerical property of a vertex.

    For example, given the movie Avengers: Endgame, Neptune ML can predict that its property popularity has a value of 5.0.

  • Link prediction   –   This task involves predicting the most likely destination nodes for a particular source node and outgoing edge, or the most likely source nodes for a given destination node and incoming edge.

    For example, with a Drug-Disease knowledge graph, given Aspirin as the source node, and treats as the outgoing edge, Neptune ML can predict the most relevant destination nodes as heart disease, fever, and so on.

    Or, with the Wikimedia knowledge graph, given President-of as the edge or relation and United-States as the destination node, Neptune ML can predict the most relevant heads as George Washington, Abraham Lincoln, Franklin D. Roosevelt, and so on.

With Neptune ML, you can use machine learning models that fall in two general categories:

Types of machine learning model currently supported by Neptune ML

  • Graph Neural Network (GNN) models   –   These include Relational Graph Convolutional Networks (R-GCNs). GNN models work for all three types of task above.

  • Knowledge-Graph Embedding (KGE) models   –   These include TransE, DistMult, and RotatE models. They only work for link prediction.

Using the Neptune ML AWS CloudFormation template to get started quickly

The easiest way to get started with Neptune ML is to use the AWS CloudFormation quick-start template. This template installs all necessary components including a Neptune DB cluster, enables Neptune ML in lab mode, and sets up the necessary IAM roles.

To create the Neptune ML quick-start stack

  1. To launch the AWS CloudFormation stack on the AWS CloudFormation console, choose one of the Launch Stack buttons in the following table:

    Region View View in Designer Launch
    US East (N. Virginia) View View in Designer
    US East (Ohio) View View in Designer
    US West (N. California) View View in Designer
    US West (Oregon) View View in Designer
    Canada (Central) View View in Designer
    South America (São Paulo) View View in Designer
    Europe (Stockholm) View View in Designer
    Europe (Ireland) View View in Designer
    Europe (London) View View in Designer
    Europe (Paris) View View in Designer
    Europe (Frankfurt) View View in Designer
    Middle East (Bahrain) View View in Designer
    Asia Pacific (Hong Kong) View View in Designer
    Asia Pacific (Tokyo) View View in Designer
    Asia Pacific (Seoul) View View in Designer
    Asia Pacific (Singapore) View View in Designer
    Asia Pacific (Sydney) View View in Designer
    Asia Pacific (Mumbai) View View in Designer
  2. On the Select Template page, choose Next.

  3. On the Specify Details page, choose Next.

  4. On the Options page, choose Next.

  5. On the Review page, select the first check box to acknowledge that AWS CloudFormation will create IAM resources. Select the second check box to acknowledge CAPABILITY_AUTO_EXPAND for the new stack.

    Note

    CAPABILITY_AUTO_EXPAND explicitly acknowledges that macros will be expanded when creating the stack, without prior review. Users often create a change set from a processed template so that the changes made by macros can be reviewed before actually creating the stack. For more information, see the AWS CloudFormation CreateStack API.

    Then choose Create.

The quick-start template creates and sets up the following:

  • A Neptune DB cluster.

  • The necessary IAM roles (and attaches them).

  • The necessary Amazon EC2 security group.

  • The necessary SageMaker VPC endpoints.

  • A DB cluster parameter group for Neptune ML.

  • The necessary parameters in that parameter group.

  • A SageMaker notebook with pre-populated notebook samples for Neptune ML.

  • The Neptune-Export service.

When the quick-start stack is ready, go to the SageMaker notebook that the template created and check out the pre-populated examples. They will help you download sample datasets to use for experimenting with Neptune ML capabilities.

They can also save you a lot of time when you are using Neptune ML. For example, see the %neptune_ml line magic, and the %%neptune_ml cel magic that these notebooks support.

You can also use the following AWS CLI command to run the quick-start AWS CloudFormation template:

aws cloudformation create-stack \ --stack-name neptune-ml-fullstack-$(date '+%Y-%m-%d-%H-%M') \ --template-url https://s3.amazonaws.com/aws-neptune-customer-samples/v2/cloudformation-templates/neptune-ml-nested-stack.json \ --parameters ParameterKey=EnableIAMAuthOnExportAPI,ParameterValue=(true if you have IAM auth enabled, or false otherwise) \ ParameterKey=Env,ParameterValue=test$(date '+%H%M')\ --capabilities CAPABILITY_IAM \ --region (the AWS region, like us-east-1) \ --disable-rollback \ --profile (optionally, a named CLI profile of yours)