Using Amazon Neptune with graph notebooks
To work with Neptune graphs you can use Neptune graph notebook or create a new Neptune database using a AWS CloudFormation template. You can also
Whether you're new to graphs and want to learn and experiment, or you're experienced and want to refine your queries, the Neptune workbench offers an interactive development environment (IDE) that can boost your productivity when you're building graph applications. The Workbench provides a user-friendly interface for interacting with your Neptune database, writing queries, and visualizing your data.
By using the AWS CloudFormation template to set up your Neptune database, and the Workbench to develop your graph applications, you can get started with Neptune quickly and efficiently, without the need for additional tooling. This allows you to focus on building your applications rather than setting up the underlying infrastructure.
Neptune provides Jupyter
You can host these notebooks in several different ways:
-
The Neptune workbench lets you run Jupyter notebooks in a fully managed environment, hosted in Amazon SageMaker, and automatically loads the latest release of the Neptune graph notebook project
for you. It is easy to set up the workbench in the Neptune console when you create a new Neptune database. Note
When creating a Neptune notebook instance, you are provided with two options for network access: Direct access through Amazon SageMaker (the default) and access through a VPC. In either option, the notebook requires access to the internet to fetch package dependencies for installing the Neptune workbench. Lack of internet access will cause the creation of a Neptune notebook instance to fail.
You can also install Jupyter locally. This lets you run the notebooks from your laptop, connected either to Neptune or to a local instance of one of the open-source graph databases. In the latter case, you can experiment with graph technology as much as you want before you spend a penny. Then, when you're ready, you can move smoothly to the managed production environment that Neptune offers.
Using the Neptune workbench to host Neptune notebooks
Neptune offers T3
and T4g
instance types that you
can get started with for less than $0.10 per hour. You are billed for workbench
resources through Amazon SageMaker, separately from your Neptune billing. See the Neptune pricing page
You can create a Jupyter or JupyterLab notebook using the Neptune workbench in the AWS Management Console in either of two ways:
Use the Notebook configuration menu when creating a new Neptune DB cluster. To do this, follow the steps outlined in Launching a Neptune DB cluster using the AWS Management Console.
Use the Notebooks menu in the left navigation pane after your DB cluster has already been created. To do this, follow the steps below.
To create a Jupyter or JupyterLab notebook using the Notebooks menu
Sign in to the AWS Management Console, and open the Amazon Neptune console at https://console.aws.amazon.com/neptune/home
. In the navigation pane on the left, choose Notebooks.
Choose Create notebook.
In the Cluster list, choose your Neptune DB cluster. If you don't yet have a DB cluster, choose Create cluster to create one.
Select a Notebook instance type.
Give your notebook a name, and optionally a description.
-
Unless you already created an AWS Identity and Access Management (IAM) role for your notebooks, choose Create an IAM role, and enter an IAM role name.
Note
If you do choose to re-use an IAM role ;created for a previous notebook, the role policy must contain the correct permissions to access the Neptune DB cluster that you're using. You can verify this by checking that the components in the resource ARN under the
neptune-db:*
action match that cluster. Incorrectly configured permissions result in connection errors when you try to run notebook magic commands. Choose Create notebook. The creation process may take 5 to 10 minutes before everything is ready.
After your notebook is created, select it and then choose Open Jupyter or Open JupyterLab.
The console can create an AWS Identity and Access Management (IAM) role for your notebooks, or you can create one yourself. The policy for this role should include the following:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::aws-neptune-notebook-
(AWS region)
", "arn:aws:s3:::aws-neptune-notebook-(AWS region)
/*" ] }, { "Effect": "Allow", "Action": "neptune-db:*", "Resource": [ "arn:aws:neptune-db:(AWS region)
:(AWS account ID)
:(Neptune resource ID)
/*" ] } ] }
Note that the second statement in the policy above lists one or more Neptune cluster resource IDs.
Also, the role should establish the following trust relationship:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "sagemaker.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }
Again, getting everything ready to go can take 5 to 10 minutes.
You can configure your new notebook to work with Neptune ML, as explained in Manually configuring a Neptune notebook for Neptune ML.
Using Python to connect a generic SageMaker notebook to Neptune
Connecting a notebook to Neptune is easy if you have installed the Neptune magics, but it is also possible to connect a SageMaker notebook to Neptune using Python, even if you are not using a Neptune notebook.
Steps to take to connect to Neptune in a SageMaker notebook cell
-
Install the Gremlin Python client:
!pip install gremlinpython
Neptune notebooks install the Gremlin Python client for you, so this step is only necessary if you're using a plain SageMaker notebook.
-
Write code such as the following to connect and issue a Gremlin query:
from gremlin_python import statics from gremlin_python.structure.graph import Graph from gremlin_python.process.graph_traversal import __ from gremlin_python.process.strategies import * from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection from gremlin_python.driver.aiohttp.transport import AiohttpTransport from gremlin_python.process.traversal import * import os port = 8182 server = '
(your server endpoint)
' endpoint = f'wss://{server}:{port}/gremlin' graph=Graph() connection = DriverRemoteConnection(endpoint,'g', transport_factory=lambda:AiohttpTransport(call_from_event_loop=True)) g = graph.traversal().withRemote(connection) results = (g.V().hasLabel('airport') .sample(10) .order() .by('code') .local(__.values('code','city').fold()) .toList()) # Print the results in a tabular form with a row index for i,c in enumerate(results,1): print("%3d %4s %s" % (i,c[0],c[1])) connection.close()
Note
If you happen to be using a version of the Gremlin Python client that is older than 3.5.0, this line:
connection = DriverRemoteConnection(endpoint,'g', transport_factory=lambda:AiohttpTransport(call_from_event_loop=True))
Would just be:
connection = DriverRemoteConnection(endpoint,'g')
Enabling CloudWatch logs on Neptune Notebooks
CloudWatch logs are now enabled by default for Neptune Notebooks. If you have an older notebook that is not producing CloudWatch logs, follow these steps to enable them manually:
Sign in to the AWS Management Console and open the SageMaker console
. On the navigation pane on the left, choose Notebook, then Notebook Instances. Look for the name of the Neptune notebook for which you would like to enable logs.
Go to the details page by selecting the name of that notebook instance.
If the notebook instance is running, select the Stop button, at the top right of the notebook details page.
Under Permissions and encryption there is a field for IAM role ARN. Select the link in this field to go to the IAM role that this notebook instance runs with.
-
Create the following policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogDelivery", "logs:CreateLogGroup", "logs:CreateLogStream", "logs:DeleteLogDelivery", "logs:Describe*", "logs:GetLogDelivery", "logs:GetLogEvents", "logs:ListLogDeliveries", "logs:PutLogEvents", "logs:PutResourcePolicy", "logs:UpdateLogDelivery" ], "Resource": "*" } ] }
Save this new policy and attach it to the IAM Role found in Step 4.
Click Start at the top right of the SageMaker notebook instance details page.
When logs start flowing, you should see a View Logs link beneath the field labeled Lifecycle configuration near the bottom left of the Notebook instance settings section of the details page.
If a notebook fails to start, there will be a message from the in the notebook details page on the SageMaker console, stating that the notebook instance took over 5 minutes to start. CloudWatch logs relevant to this issue can be found under this name:
(your-notebook-name)
/LifecycleConfigOnStart
Setting up graph notebooks on your local machine
The graph-notebook project has instructions for setting up Neptune notebooks on your local machine:
You can connect your local notebooks either to a Neptune DB cluster, or to a local or remote instance of an open-source graph database.
Using Neptune notebooks with Neptune clusters
If you are connecting to a Neptune cluster on the back end, you may want to run the notebooks in Amazon SageMaker. Connecting to Neptune from SageMaker can be more convenient than from a local installation of the notebooks, and it will let you work more easily with Neptune ML.
For instructions about how to set up notebooks in SageMaker, see Launching
graph-notebook using Amazon SageMaker
For instructions about how to set up and configure Neptune itself, see Setting up Amazon Neptune.
You can also connect a local installation of the Neptune notebooks to a Neptune DB cluster. This can be somewhat more complicated because Amazon Neptune DB clusters can only be created in an Amazon Virtual Private Cloud (VPC), which is by design isolated from the outside world. There are a number ways to connect into a VPC from the outside it. One is to use a load balancer. Another is to use VPC peering (see the Amazon Virtual Private Cloud Peering Guide).
The most convenient way for most people, however, is to connect to set up
an Amazon EC2 proxy server within the VPC and then use SSH tunnellingadditional-databases/neptune
folder of the graph-notebook
Using Neptune notebooks with open-source graph databases
To get started with graph technology at no cost, you can also use Neptune
notebooks with various open-source databases
on the back end. Examples are the TinkerPop Gremlin
server
To use Gremlin Server as your back-end database, follow these steps:
The Connecting graph-notebook to a Gremlin Server
GitHub folder. The graph-notebook Gremlin configuration
GitHub folder.
To use a local instance of Blazegraph
Review the Blazegraph quick-start instructions
to understand the basic setup and configuration required for running a Blazegraph instance. Access the graph-notebook Blazegraph configuration
GitHub folder containing the necessary files and instructions for setting up a local Blazegraph instance. . Within the GitHub repository, navigate to the "blazegraph" directory and follow the provided instructions to set up your local Blazegraph instance. This includes steps for downloading the Blazegraph software, configuring the necessary files, and starting the Blazegraph server.
Once you have a local Blazegraph instance running, you can integrate it with your application as the backend database for your graph-based data and queries. Refer to the documentation and example code provided in the graph-notebook repository to learn how to connect your application to the Blazegraph instance.
Migrating your Neptune notebooks from Jupyter to JupyterLab 3
Neptune notebooks created prior to December 21, 2022 use the Amazon Linux 1
environment. You can migrate older Jupyter notebooks created before that date to the new
Amazon Linux 2 environment with JupyterLab 3 by taking the steps described in this
AWS blog post: Migrate
your work to an Amazon SageMaker notebook instance with Amazon Linux 2
In addition, there are also a few more steps that apply specifically to migrating Neptune notebooks to the new environment:
Neptune-specific prerequisites
In the source Neptune notebook's IAM role, add all of the following permissions:
{ "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket", "s3:CreateBucket", "s3:PutObject" ], "Resource": [ "arn:aws:s3:::
(your ebs backup bucket name)
", "arn:aws:s3:::(your ebs backup bucket name)
/*" ] }, { "Effect": "Allow", "Action": [ "sagemaker:ListTags" ], "Resource": [ "*" ] }
Be sure to specify the correct ARN for the S3 bucket you will use for backing up.
Neptune-specific lifecycle configuration
When creating the second Lifecycle configuration script for restoring the backup
(from on-create.sh
) as described in the blog post, the Lifecycle name
must follow the aws-neptune-*
format, like aws-neptune-sync-from-s3
.
This ensures that the LCC can be selected during notebook creation in the Neptune
console.
Neptune-specific synchronization from a snapshot to a new instance
In the steps described in the blog post for synchronizing from a snapshot to a new instance, here are the Neptune-specific changes:
On step 4, choose notebook-al2-v2.
On step 5, re-use the IAM role from the source Neptune notebook.
-
Between steps 7 and 8:
In Notebook instance settings, set a name that uses the
aws-neptune-*
format.Open the Network settings accordion and select the same VPC, Subnet, and Security group as in the source notebook.
Neptune-specific steps after the new notebook has been created
Select the Open Jupyter button for the notebook. Once the
SYNC_COMPLETE
file shows up in the main directory, proceed to the next step.Go to the notebook instance page in the SageMaker console.
Stop the notebook.
Select Edit.
In the notebook instance settings, edit the Lifecycle configuration field by selecting the source Neptune notebook's original Lifecycle. Note that this is not the EBS backup Lifecycle.
Select Update notebook settings.
Start the notebook again.
With the modifications described here to the steps outlined in the blog post, your graph notebooks should now be migrated onto a new Neptune notebook instance that uses the Amazon Linux 2 and JupyterLab 3 environment. They'll show up for access and management on the Neptune page in the AWS Management Console, and you can now continue your work from where you left off by selecting either Open Jupyter or Open JupyterLab.