Amazon Neptune DB Clusters and Instances - Amazon Neptune

Amazon Neptune DB Clusters and Instances

An Amazon Neptune DB cluster manages access to your data through queries. A cluster consists of:

  • One primary DB instance..

  • Up to 15 read-replica DB instances..

All the instances in a cluster share the same underlying managed storage volume, which is designed for reliability and high availability.

You connect to the DB instances in your DB cluster through Neptune endpoints.

The primary DB instance in a Neptune DB cluster

The primary DB instance coordinates all write operations to the DB cluster's underlying storage volume. It also supports read operations.

There can only be one primary DB instance in a Neptune DB cluster. If the primary instance becomes unavailable, Neptune automatically fails over to one of the read-replica instances with a priority that you can specify.

Read-replica DB instances in a Neptune DB cluster

After you create the primary instance for a DB cluster, you can create up to 15 read-replica instances in your DB cluster to support read-only queries.

Neptune read-replica DB instances work well for scaling read capacity because they are fully dedicated to read operations on your cluster volume. All write operations are managed by the primary instance. Each read-replica DB instance has its own endpoint.

Because the cluster storage volume is shared among all instances in a cluster, all read-replica instances return the same data for query results with very little replication lag. This lag is usually much less than 100 milliseconds after the primary instance writes an update, although it can be somewhat longer when the volume of write operations is very large.

Having one or more read-replica instances available in different Availability Zones can increase availability, because read-replicas serve as failover targets for the primary instance. That is, if the primary instance fails, Neptune promotes a read-replica instance to become the primary instance. When this happens, there is a brief interruption while the promoted instance is rebooted, during which read and write requests made to the primary instance fail with an exception.

By contrast, if your DB cluster doesn't include any read-replica instances, your DB cluster remains unavailable when the primary instance fails until it has been re-created. Re-creating the primary instance takes considerably longer than promoting a read-replica.

To ensure high availability, we recommend that you create one or more read-replica instances that have the same DB instance class as the primary instance and are located in different Availability Zones than the primary instance. See Fault Tolerance for a Neptune DB Cluster.

When you create read-replicas across Availability Zones, Neptune automatically provisions and maintains them synchronously. The primary DB instance is synchronously replicated across Availability Zones to the read-replicas to provide data redundancy, eliminate I/O freezes, and minimize latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, and help protect your databases against failure and Availability Zone disruption.

Using the console, you can create a Multi-AZ deployment by simply specifying Multi-AZ when creating a DB cluster. If a DB cluster is in a single Availability Zone, you can make it a Multi-AZ DB cluster adding a Neptune replica in a different Availability Zone.

Note

You can't create an encrypted read-replica instance for an unencrypted Neptune DB cluster, or an unencrypted read-replica instance for an encrypted Neptune DB cluster.

For details on how to create a Neptune read-replica DB instance, see Creating a Neptune Replica Using the Console.

Sizing DB instances in a Neptune DB cluster

Size the instances in your Neptune DB cluster based on your CPU and memory requirements. The number of vCPUs on an instance determines the number of query threads that handle incoming queries. The amount of memory on an instance determines the size of the buffer cache, used for storing copies of data pages fetched from the underlying storage volume.

Each Neptune DB instance has a number of query threads equal to 2 x number of vCPUs on that instance. An r4.xlarge, for example, with 2 vCPUs, has 4 query threads, and can therefore process 4 requests concurrently. An r5.4xlarge, with 16 vCPUs, has 32 query threads, and can therefore process 32 queries concurrently.

Additional queries that arrive while all query threads are occupied are put into a server-side queue, and processed in a FIFO manner as query threads become available. This server-side queue can hold approximately 8000 pending requests. Once it's full, Neptune respond to additional requests with a ThrottlingException. You can monitor the number of pending requests with the MainRequestQueuePendingRequests CloudWatch metric, or by using the Gremlin query status endpoint with the includeWaiting parameter.

Query execution time from a client perspective includes of any time spent in the queue, in addition to the time taken to actually execute the query.

A sustained concurrent write load that utilizes all the query threads on the primary DB instance ideally shows 90% or more CPU utilization, which indicates that all the query threads on the server are actively engaged in doing useful work. However, actual CPU utilization is often somewhat lower, even under a sustained concurrent write load. This is usually because query threads are waiting on I/O operations to the underlying storage volume to complete. Neptune uses quorum writes that make six copies of your data across three Availability Zones, and four out of those six storage nodes must acknowledge a write for it to be considered durable. While a query thread waits for this quorum from the storage volume, it is stalled, which reduces CPU utilization.

If you have a serial write load where you are performing one write after another and waiting for the first to complete before beginning the next, you can expect the CPU utilization to be lower still. The exact amount will be a function of the number of vCPUs and query threads (the more query threads, the less overall CPU per query), with some reduction caused by waiting for I/O.

Monitoring DB instance performance in Neptune

You can use CloudWatch metrics in Neptune to monitor what is happening on your DB instances and keep track of query latency as observed by the client. See Monitoring Neptune. The following metrics are particularly useful:

  • CPUUtilization   –   Shows the percentage of CPU utilization.

  • VolumeWriteIOPs   –   Shows the average number of disk I/O writes to the cluster volume, reported at 5-minute intervals.

  • MainRequestQueuePendingRequests   –   Shows the number of requests waiting in the input queue pending execution.

You can also find out how many requests are pending on the server by using the Gremlin query status endpoint with the includeWaiting parameter. This will give you the status of all waiting queries.

The following indicators can help you adjust your Neptune provisioning and query strategues to improve efficiency and performance:

  • Consistent latency, high CPUUtlization, high VolumeWriteIOPs and low MainRequestQueuePendingRequests together show that the server is actively engaged processing concurrent write requests at a sustainable rate, with little I/O wait.

  • Consistent latency, low CPUUtlization, low VolumeWriteIOPs and no MainRequestQueuePendingRequests together show that you have excess capacity on the primary DB instance for processing write requests.

  • High CPUUtlization and high VolumeWriteIOPs but variable latency and MainRequestQueuePendingRequests together show that you are sending more work than the server can process in a given interval. Consider creating or resizing batch requests so as to do the same amount of work with less transactional overhead and/or scaling the primary instance up to increase the number of query threads capable of processing write requests concurrently.

  • Low CPUUtilization with high VolumeWriteIOPs mean that query threads are waiting for I/O operations to the storage layer to complete. If you see variable latencies and some increase in MainRequestQueuePendingRequests, consider creating or resizing batch requests so as to do the same amount of work with less transactional overhead.