Getting set up with Neptune Jupyter notebooks - Amazon Neptune

Getting set up with Neptune Jupyter notebooks

The notebooks in the graph notebook project are all open-source, located in the /src/graph_notebook/notebooks folder of the graph-notebook GitHub repository.

You can use them to walk through setting up, configuring, populating and querying graphs using different query languages, different data sets, and even different databases on the back end.

Using the Neptune workbench to host Neptune notebooks

An easy way to get started with Amazon Neptune is to use the Neptune workbench. The workbench lets you work with your Neptune DB cluster using Jupyter notebooks hosted by Amazon SageMaker, including the ones that Neptune provides in the graph notebook project.

Neptune offers T3 instance types that you can get started with for only $0.10/Hr (see the Neptune pricing page).

You are billed for workbench resources through Amazon SageMaker, separately from your Neptune billing.

You can use the Neptune console to set up the Neptune workbench in Amazon SageMaker and load the Neptune notebooks, or create a new Jupyter notebook of your own:

To create a Jupyter notebook using the Neptune workbench

  1. Make sure that the security group attached in the VPC where Neptune is running has a rule that allows inbound connections from SageMaker.

  2. Sign in to the AWS Management Console, and open the Amazon Neptune console at

  3. In the navigation pane on the left, choose Notebooks.

  4. Choose Create notebook.

  5. In the Cluster list, choose your Neptune DB cluster. If you don't yet have a DB cluster, choose Create cluster to create one.

  6. Give your notebook a name, and optionally a description.

  7. Unless you already created an AWS Identity and Access Management (IAM) role for your notebooks, choose Create an IAM role, and enter an IAM role name.

  8. Choose Create notebook. The creation process may take 10 to 15 minutes before everything is ready.

  9. After your notebook is created, select it and then choose Open notebook.

The console can create an AWS Identity and Access Management (IAM) role for your notebooks, or you can create one yourself. The policy for this role should include the following:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::aws-neptune-notebook", "arn:aws:s3:::aws-neptune-notebook/*" ] }, { "Effect": "Allow", "Action": "neptune-db:connect", "Resource": [ "your-cluster-arn/*" ] } ] }

Also, the role should establish the following trust relationship:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "" }, "Action": "sts:AssumeRole" } ] }

Again, getting everything ready to go can take 10 to 15 minutes.

Using Python to connect a generic SageMaker notebook to Neptune

Connecting a notebook to Neptune is easy if you have installed the Neptune magics, but it is also possible to connect a SageMaker notebook to Neptune using Python, even if you are not using a Neptune notebook.

Steps to take to connect to Neptune in a SageMaker notebook cell

  1. Install the Gremlin Python client:

    !pip install gremlinpython

    Neptune notebooks install the Gremlin Python client for you, so this step is only necessary if you're using a plain SageMaker notebook.

  2. Write code such as the following to connect and issue a Gremlin query:

    from gremlin_python import statics from gremlin_python.structure.graph import Graph from gremlin_python.process.graph_traversal import __ from gremlin_python.process.strategies import * from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection from gremlin_python.driver.aiohttp.transport import AiohttpTransport from gremlin_python.process.traversal import * import os port = 8182 server = '(your server endpoint)' endpoint = f'wss://{server}:{port}/gremlin' graph=Graph() connection = DriverRemoteConnection(endpoint,'g', transport_factory=lambda:AiohttpTransport(call_from_event_loop=True)) g = graph.traversal().withRemote(connection) results = (g.V().hasLabel('airport'). sample(10). order().by('code'). local(__.values('code','city').fold()). toList()) # Print the results in a tabular form with a row index for i,c in enumerate(results,1): print("%3d %4s %s" % (i,c[0],c[1])) connection.close()

If you happen to be using a version of the Gremlin Python client that is older than 3.5.0, this line:

connection = DriverRemoteConnection(endpoint,'g', transport_factory=lambda:AiohttpTransport(call_from_event_loop=True))

Would just be:

connection = DriverRemoteConnection(endpoint,'g')

Setting up graph notebooks on your local machine

The graph-notebook project has instructions for setting up Neptune notebooks on your local machine:

You can connect your local notebooks either to a Neptune DB cluster, or to a local or remote instance of an open-source graph database.

Using Neptune notebooks with Neptune clusters

If you are connecting to a Neptune cluster on the back end, you may want to run the notebooks in Amazon SageMaker. Connecting to Neptune from SageMaker can be more convenient than from a local installation of the notebooks, and it will let you work more easily with Neptune ML.

For instructions about how to set up notebooks in SageMaker, see Launching graph-notebook using Amazon SageMaker.

For instructions about how to set up and configure Neptune itself, see Setting up Neptune.

You can also connect a local installation of the Neptune notebooks to a Neptune DB cluster. This can be somewhat more complicated because Amazon Neptune DB clusters can only be created in an Amazon Virtual Private Cloud (VPC), which is by design isolated from the outside world. There are a number ways to connect into a VPC from the outside it. One is to use a load balancer. Another is to use VPC peering (see the Amazon Virtual Private Cloud Peering Guide).

The most convenient way for most people, however, is to connect to set up an Amazon EC2 proxy server within the VPC and then use SSH tunnelling (also called port fowarding), to connect to it. You can find instructions about how to set up at Connecting graph notebook locally to Amazon Neptune in the additional-databases/neptune folder of the graph-notebook GitHub project.

Using Neptune notebooks with open-source graph databases

To get started with graph technology at no cost, you can also use Neptune notebooks with various open-source databases on the back end. Examples are the TinkerPop Gremlin server, and the Blazegraph database.

To use Gremlin Server as your back-end database, follow the instructions at:

To use a local instance of Blazegraph as your back-end database, follow these instructions: