Use Neptune graph notebooks to get started quickly - Amazon Neptune

Use Neptune graph notebooks to get started quickly

You don't have to use Neptune graph notebooks to work with a Neptune graph, so if you want to, you can go ahead and create a new Neptune database right away using a AWS CloudFormation template.

At the same time, whether you're new to graphs and want to learn and experiment, or you're experienced and want to refine your queries, Neptune graph notebooks offer a great development platform and can be a huge time-saver.

Neptune provides open-source Jupyter notebooks in the Neptune graph notebook project on GitHub. These notebooks present tutorials and code samples in an interactive coding environment where you can learn about graph technology and Neptune.

All the notebooks in the /src/graph_notebook/notebooks folder of the graph-notebook GitHub repository are open-source. You can use them to walk through setting up, configuring, populating and querying graphs using different query languages, different data sets, and even different databases on the back end.

You can host these notebooks in several different ways:

  • The Neptune workbench lets you run Jupyter notebooks in a fully managed environment, hosted in Amazon SageMaker, and automatically connects to the Neptune graph notebook project for you. It is easy to set up the workbench in the Neptune console when you create a new Neptune database.

  • You can also install Jupyter locally and run the notebooks from your laptop, connected either to Neptune or to a local instance of one of the open-source graph databases. In the latter case you can experiment with graph technology as much as you want before you spend a penny, and then move smoothly to the managed production environment that Neptune offers.

Using the Neptune workbench to host Neptune notebooks

An easy way to set up Neptune Jupyter notebooks and also any notebooks that you create yourself, is to use the Neptune workbench. The workbench provides a fully managed environment for notebooks, hosted by Amazon SageMaker, and automatically links to the notebooks in the open-source graph notebook project.

Neptune offers T3 instance types that you can get started with for only $0.10/Hr (see the Neptune pricing page).

You are billed for workbench resources through Amazon SageMaker, separately from your Neptune billing.

You can use the Neptune console to set up the Neptune workbench in Amazon SageMaker when you create a new DB cluster. After you have done that, it is easy to use the Neptune notebooks, or to create a new Jupyter notebook of your own, like this:

To create a Jupyter notebook using the Neptune workbench

  1. Make sure that the security group attached in the VPC where Neptune is running has a rule that allows inbound connections from SageMaker.

  2. Sign in to the AWS Management Console, and open the Amazon Neptune console at

  3. In the navigation pane on the left, choose Notebooks.

  4. Choose Create notebook.

  5. In the Cluster list, choose your Neptune DB cluster. If you don't yet have a DB cluster, choose Create cluster to create one.

  6. Give your notebook a name, and optionally a description.

  7. Unless you already created an AWS Identity and Access Management (IAM) role for your notebooks, choose Create an IAM role, and enter an IAM role name.

  8. Choose Create notebook. The creation process may take 10 to 15 minutes before everything is ready.

  9. After your notebook is created, select it and then choose Open notebook.

The console can create an AWS Identity and Access Management (IAM) role for your notebooks, or you can create one yourself. The policy for this role should include the following:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::aws-neptune-notebook", "arn:aws:s3:::aws-neptune-notebook/*" ] }, { "Effect": "Allow", "Action": "neptune-db:*", "Resource": [ "your-cluster-arn/*" ] } ] }

Also, the role should establish the following trust relationship:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "" }, "Action": "sts:AssumeRole" } ] }

Again, getting everything ready to go can take 10 to 15 minutes.

Using Python to connect a generic SageMaker notebook to Neptune

Connecting a notebook to Neptune is easy if you have installed the Neptune magics, but it is also possible to connect a SageMaker notebook to Neptune using Python, even if you are not using a Neptune notebook.

Steps to take to connect to Neptune in a SageMaker notebook cell

  1. Install the Gremlin Python client:

    !pip install gremlinpython

    Neptune notebooks install the Gremlin Python client for you, so this step is only necessary if you're using a plain SageMaker notebook.

  2. Write code such as the following to connect and issue a Gremlin query:

    from gremlin_python import statics from gremlin_python.structure.graph import Graph from gremlin_python.process.graph_traversal import __ from gremlin_python.process.strategies import * from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection from gremlin_python.driver.aiohttp.transport import AiohttpTransport from gremlin_python.process.traversal import * import os port = 8182 server = '(your server endpoint)' endpoint = f'wss://{server}:{port}/gremlin' graph=Graph() connection = DriverRemoteConnection(endpoint,'g', transport_factory=lambda:AiohttpTransport(call_from_event_loop=True)) g = graph.traversal().withRemote(connection) results = (g.V().hasLabel('airport'). sample(10). order().by('code'). local(__.values('code','city').fold()). toList()) # Print the results in a tabular form with a row index for i,c in enumerate(results,1): print("%3d %4s %s" % (i,c[0],c[1])) connection.close()

If you happen to be using a version of the Gremlin Python client that is older than 3.5.0, this line:

connection = DriverRemoteConnection(endpoint,'g', transport_factory=lambda:AiohttpTransport(call_from_event_loop=True))

Would just be:

connection = DriverRemoteConnection(endpoint,'g')

Enabling CloudWatch logs on Neptune Notebooks

CloudWatch logs are now enabled by default for Neptune Notebooks. If you have an older notebook that is not producing CloudWatch logs, follow these steps to enable them manually:

  1. Sign in to the AWS Management Console and open the SageMaker console.

  2. On the navigation pane on the left, choose Notebook, then Notebook Instances. Look for the name of the Neptune notebook for which you would like to enable logs.

  3. Go to the details page by selecting the name of that notebook instance.

  4. If the notebook instance is running, select the Stop button, at the top right of the notebook details page.

  5. Under Permissions and encryption there is a field for IAM role ARN. Select the link in this field to go to the IAM role that this notebook instance runs with.

  6. Create the following policy:

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:CreateLogDelivery", "logs:CreateLogGroup", "logs:CreateLogStream", "logs:DeleteLogDelivery", "logs:Describe*", "logs:GetLogDelivery", "logs:GetLogEvents", "logs:ListLogDeliveries", "logs:PutLogEvents", "logs:PutResourcePolicy", "logs:UpdateLogDelivery" ], "Resource": "*" } ] }
  7. Save this new policy and attach it to the IAM Role found in Step 4.

  8. Click Start at the top right of the SageMaker notebook instance details page.

  9. When logs start flowing, you should see a View Logs link beneath the field labeled Lifecycle configuration near the bottom left of the Notebook instance settings section of the details page.

If a notebook fails to start, there will be a message from the in the notebook details page on the SageMaker console, stating that the notebook instance took over 5 minutes to start. CloudWatch logs relevant to this issue can be found under this name:


Setting up graph notebooks on your local machine

The graph-notebook project has instructions for setting up Neptune notebooks on your local machine:

You can connect your local notebooks either to a Neptune DB cluster, or to a local or remote instance of an open-source graph database.

Using Neptune notebooks with Neptune clusters

If you are connecting to a Neptune cluster on the back end, you may want to run the notebooks in Amazon SageMaker. Connecting to Neptune from SageMaker can be more convenient than from a local installation of the notebooks, and it will let you work more easily with Neptune ML.

For instructions about how to set up notebooks in SageMaker, see Launching graph-notebook using Amazon SageMaker.

For instructions about how to set up and configure Neptune itself, see Setting up Neptune.

You can also connect a local installation of the Neptune notebooks to a Neptune DB cluster. This can be somewhat more complicated because Amazon Neptune DB clusters can only be created in an Amazon Virtual Private Cloud (VPC), which is by design isolated from the outside world. There are a number ways to connect into a VPC from the outside it. One is to use a load balancer. Another is to use VPC peering (see the Amazon Virtual Private Cloud Peering Guide).

The most convenient way for most people, however, is to connect to set up an Amazon EC2 proxy server within the VPC and then use SSH tunnelling (also called port fowarding), to connect to it. You can find instructions about how to set up at Connecting graph notebook locally to Amazon Neptune in the additional-databases/neptune folder of the graph-notebook GitHub project.

Using Neptune notebooks with open-source graph databases

To get started with graph technology at no cost, you can also use Neptune notebooks with various open-source databases on the back end. Examples are the TinkerPop Gremlin server, and the Blazegraph database.

To use Gremlin Server as your back-end database, follow the instructions at:

To use a local instance of Blazegraph as your back-end database, follow these instructions: