Accessing the Neptune Graph with openCypher
Neptune supports building graph applications using openCypher, currently one of the most popular query languages for developers working with graph databases. Developers, business analysts, and data scientists like openCypher’s SQL-inspired syntax because it provides a familiar structure to compose queries for graph applications.
openCypher is a declarative query language
for property graphs that was originally developed by Neo4j, then open-sourced in 2015,
and contributed to the openCypher
For the limitations and differences in Neptune support of the openCypher specification, see openCypher standards compliance in Amazon Neptune.
Note
The current Neo4j implementation of the Cypher query language has diverged in some ways from the openCypher specification. If you are migrating current Neo4j Cypher code to Neptune, see Neptune compatibility with Neo4j and Rewriting Cypher queries to run in openCypher on Neptune for help.
Starting with engine release 1.1.1.0, openCypher is available for production use in Neptune.
Gremlin vs. openCypher: similarities and differences
Gremlin and openCypher are both property-graph query languages, and they are complementary in many ways.
Gremlin was designed to appeal to programmers and fit seamlessly into code. As a result, Gremlin is imperative by design, whereas openCypher's declarative syntax may feel more familiar for people with SQL or SPARQL experience. Gremlin might seem more natural to a data scientist using Python in a Jupyter notebook, whereas openCypher might seem more intuitive to a business user with some SQL background.
The nice thing is that you don't have to choose between Gremlin and openCypher in Neptune. Queries in either language can operate on the same graph regardless of which of the two language was used to enter that data. You may find it more convenient to use Gremlin for some things and openCypher for others, depending on what you're doing.
Gremlin uses an imperative syntax that lets you control how you move through
your graph in a series of steps, each of which takes in a stream of data, performs
some action on it (using a filter, map, and so forth), and then outputs the
results to the next step. A Gremlin query commonly takes the form, g.V()
,
followed by additional steps.
In openCypher, you use a declarative syntax, inspired by SQL, that specifies a
pattern of nodes and relationships to find in your graph using a motif syntax
(like ()-[]->()
). An openCypher query often starts with a MATCH
clause, followed by other clauses such as WHERE
, WITH
, and
RETURN
.
Getting started using openCypher
You can query property-graph data in Neptune using openCypher regardless of how it was loaded, but you can't use openCypher to query data loaded as RDF.
The Neptune bulk loader accepts property-graph data in a CSV format for Gremlin, and in a CSV format for openCypher. Also, of course, you can add property data to your graph using Gremlin and/or openCypher queries.
There are many online tutorials available for learning the Cypher
query language. Here, a few quick examples of openCypher queries may help you get an
idea of the language, but by far the best and easiest way to get started using
openCypher to query your Neptune graph is by using the openCypher notebooks in the
Neptune workbench. The
workbench is open-source, and is hosted on GitHub at https://github.com/aws-samples/amazon-neptune-samples
You'll find the openCypher notebooks in the GitHub Neptune
graph-notebook repository
Data processed by openCypher takes the form of an unordered series of key/value maps. The main way to refine, manipulate, and augment these maps is to use clauses that perform tasks such as pattern matching, insertion, update, and deletion on the key/value pairs.
There are several clauses in openCypher for finding data patterns in the graph, of
which MATCH
is the most common. MATCH
lets you specify the
pattern of nodes, relationships, and filters that you want to look for in your graph.
For example:
-
Get all nodes
MATCH (n) RETURN n
-
Find connected nodes
MATCH (n)-[r]->(d) RETURN n, r, d
-
Find a path
MATCH p=(n)-[r]->(d) RETURN p
-
Get all nodes with a label
MATCH (n:airport) RETURN n
Note that the first query above returns every single node in your graph, and the next two return every node that has a relationship— this is not generally recommended! In almost all cases, you want to narrow down the data being returned, which you can do by specifying node or relationship labels and properties, as in the fourth example.
You can find a handy cheat-sheet for openCypher syntax in the Neptune
github
sample repository