Neptune Analytics algorithms - Neptune Analytics

Neptune Analytics algorithms

Graph algorithms are powerful tools for gaining insights into data. Neptune Analytics provides a set of optimized in-database implementations of common graph algorithms that are exposed as openCypher procedures. These algorithms analyze inherent aspects of the underlying graph structure, such as connectedness (path finding), relative importance (centrality), and community membership (community detection).

Neptune Analytics natively supports over 25 optimized graph algorithms and variants in the 5 most popular categories that help customers extract insights from their graphs, which are listed in the following table.

Category Action Algorithms Common Uses

Pathfinding

Find the existence, quality, or availability of a path between nodes.

  • Breadth-First Search

  • Single-Source Shortest Path

  • Top-K Source Shortest Path

  • Logistics optimization

  • Social network recommendations

  • Route optimization

Centrality

Determines the absolute or relative importance of a node in the graph.

  • Degree

  • PageRank

  • Closeness Centrality

  • Fraud ring/Collusion detection

  • Social network influencer identification

  • Supply chain risk analysis

Similarity

Compare the similarities between different graph structures.

  • Common Neighbors

  • Total Neighbors

  • Jaccard Similarity

  • Overlap Similarity

  • Biological structural analysis

  • Social network cluster comparison

  • Link prediction

Clustering and Community Detection

Identify meaningful groups or clusters within graph structures.

  • Weakly Connected Components (WCC)

  • Strongly Connected Components (SCC)

  • Label Propagation

  • Social network clusters

  • Fraud ring identification

  • Householding

  • Biological interaction

Vector Similarity Search

Identify approximate nearest neighbor (ANN) nodes by comparing vector embeddings using the Hierarchical Navigable Small World (HNSW) algorithm.

  • Distance

  • Top-K

  • RAG applications

  • Knowledge graph backed chat bots

  • Approximate nearest neighbors

Many of these algorithms require interacting with most or all the nodes and edges in a graph, often in an iterative fashion. As a result, they are too computationally expensive to process using normal analytic technologies. Neptune Analytics has built highly optimized implementations that allow them to run over graphs of any size.

Algorithms in Neptune Analytics are integrated naturally into openCypher through the CALL clause, as illustrated below. This lets you combine algorithms naturally with openCypher clauses, functions, and semantics to build very complex queries. For example, you could look for the top 10 most important airports in the US-AK region like this:

MATCH (n:airport {region: 'US-AK'}) CALL neptune.algo.pageRank(n, {edgeLabels: ['route'], numOfIterations: 10}) YIELD rank RETURN n.code, rank ORDER BY rank DESC LIMIT 10

You can run algorithms in the SDKs using the ExecuteOpenCypherQuery operation or in boto3 and the AWS CLI using the execute-query command. If you don't want to use the SDK or CLI, you can use you can use awscurl to sign your Neptune Analytics requests using signed using Signature Version 4 (Sig4). For example, you can run a simple breadth-first search like this:

awscurl -X POST -H "Content-Type: application/x-www-form-urlencoded" \ https://(graphIdentifier).(region).neptune-graph.amazonaws.com/opencypher \ --service neptune-graph \ --region (region) \ -d "query=CALL neptune.algo.bfs([\"101\", \"102\"], {edgeLabels: [\"route\"]})"

You could run the same query using the AWS CLI like this:

aws neptune-graph execute-query \ --graph-identifier ${graphIdentifier} \ --query-string 'CALL neptune.algo.bfs(["101", "102"], {edgeLabels: ["route"]})' \ --language open_cypher \ /tmp/out.txt

Algorithms having signatures with different kinds of input in Neptune Analytics are exposed as separate algorithms. Unless otherwise indicated, the examples here are using the Air Routes dataset.

Neptune Analytics currently supports five main categories of algorithm:

  • Path finding algorithms   –   These find the existence, quality, or availability of a path or paths between two or more nodes in the graph. A path in this sense is a set of nodes and connecting edges.

    By efficiently determining the optimal route between two nodes, path-finding algorithms enable you to model real-world systems like roads or social networks as interconnected nodes and edges. Finding the shortest paths between various points is crucial in applications like route planning for GPS systems, logistics optimization, and even in solving complex problems in fields like biology or engineering.

  • Centrality algorithms   –   These are used to determine the absolute or relative importance or influence of a node or nodes in the graph.

    By identifying the most influential or important nodes within a network, centrality algorithms can provide insights about key players or critical points of interaction. This is valuable in social network analysis, where it helps pinpoint influential individuals, and in transportation networks, where it aids in identifying crucial hubs for efficient routing and resource allocation.

  • Similarity algorithms   –   Graph similarity algorithms allow you to compare and analyze the similarities and dissimilarities between different graph structures, which can provide insight into relationships, patterns, and commonalities across diverse datasets. This is invaluable in various fields, such as biology, for comparing molecular structures, such as social networks, for identifying similar communities, and such as recommendation systems, for suggesting similar items based on user preferences.

  • Clustering or community-detection algorithms   –   Community-detection algorithms can identify meaningful groups or clusters of nodes in a network, revealing hidden patterns and structures that can provide insights into the organization and dynamics of complex systems. This is valuable in social network analysis, and in biology, for identifying functional modules in protein-protein interaction networks, and more generally for understanding information flow and influence propagation in many different domains.

  • Vector Similarity Search   –   Vector similarity algorithms work by using vector based representations of data, a.k.a. embeddings, to answer questions about the data's context and its similarity and connection to other data. This is valuable in applications such as Retrieval Augmented Generation (RAG) applications, knowledge graph backed chatbots, and recommendation engines.

Custom Algorithms

Property graph information

Property Graph Information (graph.pg_info) summarizes some of the basic metrics of the graph, such as the number of vertices, the number of edges, the number of edge properties, the number of vertex properties, the number of edge labels, and the number of vertex labels.

Inputs for graph.pg_info

There are no inputs for graph.pg_info.

Outputs for graph.pg_info

There are two columns in the output relation: the first column is the metric name and the second column is the count.

metric: the metrics that graph.pg_info will return, which include:
  • numVertices: the number of vertices in the graph.

  • numEdges: the number of edges in the graph.

  • numVertexProperties: the number of node properties in the graph.

  • numEdgeProperties: the number of edge properties in the graph.

  • numVertexLabels: the number of unique vertex labels in the graph.

  • numEdgeLabels: the number of unique edge labels in the graph.

count
  • count: the value of the above metrics.

graph.pg_info query example

## Syntax CALL neptune.graph.pg_info() YIELD metric, count RETURN metric, count

graph.pg_info query integration

# sample query integration CALL neptune.graph.pg_info() YIELD metric, count WHERE metric = 'numVertices' RETURN count

Sample graph.pg_info output

# sample output of graph.pg_info aws neptune-graph execute-query \ --graph-identifier ${graphIdentifier} \ --query-string "CALL neptune.graph.pg_info() YIELD metric, count RETURN metric, count " \ --language open_cypher \ /tmp/out.txt cat /tmp/out.txt { "results": [{ "metric": "numVertices", "count": 3748 }, { "metric": "numEdges", "count": 57538 }, { "metric": "numVertexProperties", "count": 42773 }, { "metric": "numEdgeProperties", "count": 50532 }, { "metric": "numVertexLabels", "count": 4 }, { "metric": "numEdgeLabels", "count": 2 }] }