Amazon Neptune
User Guide (API Version 2017-11-29)

Best Practices: Getting the Most Out of Neptune

Use this information as a reference to quickly find recommendations for using Amazon Neptune and maximizing performance.

General Best Practices

The following are some general recommendations for working with Neptune.

Load Balancing Across Read Replicas

You can load balance requests across read replicas by connecting to instance endpoints explicitly. Use the instance endpoints to direct requests to specific read replicas. You must perform any load balancing on the client side.

The read-only (ro) endpoint does not provide any load balancing.

Loader

The following best practices can help you improve performance when loading data into a Neptune DB instance. For more information about loading data in Neptune, see Loading Data into Neptune.

Loading Faster Using a Temporary Larger Instance

Your load performance increases with larger instance sizes. If you're not using a large instance type, but you want increased load speeds, you can use a larger instance to load and then delete it.

Note

The following procedure is for a new cluster. If you have an existing cluster, you can add a new larger instance and then promote it to a primary DB instance.

To load data using a larger instance size

  1. Create a cluster with a single r4.8xlarge instance. This instance is the primary DB instance.

  2. Create one or more read replicas with your desired instance size.

  3. Load your data using the Neptune loader. The load job runs on the primary DB instance.

  4. After the data is finished loading, delete the primary DB instance.

Gremlin

Follow these recommendations when using the Gremlin graph traversal language with Neptune. For information about using Gremlin with Neptune, see Accessing the Neptune Graph with Gremlin.

Pruning Records with the Creation Time Property

You can prune stale records by storing the creation time as a property on vertices and dropping them periodically.

If you need to store data for a specific lifetime and then remove it from the graph (vertex time to live), you can store a time stamp property at the creation of the vertex. You can then periodically issue a drop() query for all vertices that were created before a certain time; for example:

g.V().has(“timestamp”, gt(datetime('2018-10-11')))

Using the datetime() Method for Groovy Time Data

Neptune provides the datetime method for specifying dates and times for queries sent in the Gremlin Groovy variant. This includes the Gremlin Console, text strings using the HTTP REST API, and any other serialization that uses Groovy.

Important

This only applies to methods where you send the Gremlin query as a text string. If you are using a Gremlin Language Variant (GLV), you must use the native date classes and functions for the language. For more information see the next section, Using Native Date and Time for GLV Time Data.

You can use the datetime method to store and compare dates:

g.V('3').property('date',datetime('2001-02-08'))
g.V().has('date',gt(datetime('2000-01-01')))

Using Native Date and Time for GLV Time Data

If you are using a Gremlin Language Variant (GLV), you must use the native date and time classes and functions provided by the programming language for Gremlin time data.

The official TinkerPop Java, Node.js (JavaScript), Python, or .NET libraries are all Gremlin Language Variant (GLV) libraries.

Important

This only applies to Gremlin Language Variant (GLV) libraries. If you are using a method where you send the Gremlin query as text string, you must use the datetime() method provided by Neptune. This includes the Gremlin Console, text strings using the HTTP REST API, and any other serialization that uses Groovy. For more information see the preceding section, Using the datetime() Method for Groovy Time Data.

Python

The following is an partial example in Python that creates a single property named 'date' for the vertex with an ID of '3'. It sets the value to be a date generated using the Python datetime.now() method.

import datetime g.V('3').property('date',datetime.datetime.now().next()

For a complete example for connecting to Neptune using Python, see Using Python to Connect to a Neptune DB Instance

Node.js (JavaScript)

The following is an partial example in JavaScript that creates a single property named 'date' for the vertex with an ID of '3'. It sets the value to be a date generated using the Node.js Date() constructor.

g.V('3').property('date', new Date()).next()

For a complete example for connecting to Neptune using Node.js, see Using Node.js to Connect to a Neptune DB Instance

Java

The following is an partial example in Java that creates a single property named 'date' for the vertex with an ID of '3'. It sets the value to be a date generated using the Java Date() constructor.

import java.util.date g.V('3').property('date', new Date()).next();

For a complete example for connecting to Neptune using Java, see Using Java to Connect to a Neptune DB Instance

.NET (C#)

The following is an partial example in C# that creates a single property named 'date' for the vertex with an ID of '3'. It sets the value to be a date generated using the .NET DateTime.UtcNow property.

Using System; g.V('3').property('date', DateTime.UtcNow).next()

For a complete example for connecting to Neptune using C#, see Using .NET to Connect to a Neptune DB Instance

Sharing a Single Gremlin Java Client Instance Across Multiple Threads

Create only one instance of the org.apache.tinkerpop.gremlin.driver.Client class per instance (or group of instances) and share it across multiple threads. That is, only call Client client = Cluster.connect() once rather than do so in each thread.

Note

This also applies to GraphTraversalSource, which creates an internal instance of the Client class. For example, the following code will create a Client instance.

GraphTraversalSource traversal= EmptyGraph.instance().traversal().withRemote(DriverRemoteConnection.using(cluster));

Create Separate Gremlin Java Client Instances for Read and Write Endpoints

You can increase performance by only performing writes on the writer endpoint and reading from one or more read-only endpoints.

Client readerClient = Cluster.build("http://reader-endpoint:8182/gremlin") ... .connect() Client writerClient = Cluster.build("http://writer-endpoint:8182/gremlin") ... .connect()

Adding Multiple Read Replica Endpoints to a Gremlin Java Connection Pool

When creating a Gremlin Java Driver Cluster object, you can use the .addContactPoint() method to add multiple read replica intances to the connection pool's contact points.

Cluster.Builder readerBuilder = Cluster.build() .port(8182) .minConnectionPoolSize(…) .maxConnectionPoolSize(…) ……… .addContactPoint("reader-endpoint-1") .addContactPoint("reader-endpoint-2")

Explicitly Closing Gremlin Java Driver Connections to Avoid Connection Limit

If you do not explicitily close your connections to Amazon Neptune, the connections may be kept alive and you will reach the limit of 60,000 WebSockets connections and additional connections will be refused and HTTP 429 will be returned.

Cluster.close() closes all the Client instances created from the cluster and all the connection created by the Client instances.

If you reach the connection limit, you must restart the Neptune instance to close the existing connections.

Avoiding Memory Leaks Due to Keep-Alive Bug in TinkerPop Gremlin Java Driver

There is a bug in the Gremlin Java driver where the keep-alive is task executed 30 minutes after every Connection.write call, which can result in a build-up in in memory usage, for more information see TINKERPOP-2030.

To avoid this, you can upgrade to Gremlin Java Driver version 3.3.4+.

As a workaround for earlier versions, you can set the keepAliveInterval to 0 as shown in the following code.

Cluster.Builder readerBuilder = Cluster.build() .port(8182) … .keepAliveInterval(0)

Creating a New Connection After Failover

In case of failover the Gremlin Driver may continue connecting to the old writer because the cluster DNS name has been resolved to an IP address. You can create a new Client object after failover if this happens.

SPARQL

Follow these best practices when using the SPARQL query language with Neptune. For information about using SPARQL in Neptune, see Accessing the Neptune Graph with SPARQL.

Querying All Named Graphs by Default

Amazon Neptune associates every triple with a named graph. The default graph is defined as the union of all named graphs.

If you submit a SPARQL query without explicitly specifying a graph via the GRAPH keyword or constructs such as FROM NAMED, Neptune always considers all triples in your DB instance. For example, the following query returns all triples from a Neptune SPARQL endpoint:

SELECT * WHERE { ?s ?p ?o }

Triples that appear in more than one graph are returned only once.

For information about the default graph specification, see the RDF Dataset section of the SPARQL 1.1 Query Language specification.

Specifying a Named Graph for Load

Amazon Neptune associates every triple with a named graph. If you don't specify a named graph when loading, inserting, or updating triples, Neptune uses the fallback named graph defined by the URI http://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph.

You can specify the named graph to use for all triples (or quads with the fourth position blank) by using the parserConfiguration: namedGraphUri parameter. For information about the Load command syntax, see Loader Command.

On this page: