Amazon Neptune storage reliability and high availability - Amazon Neptune

Amazon Neptune storage reliability and high availability

Amazon Neptune is designed to be reliable, durable, and fault tolerant.

Neptune data is stored in a cluster volume, which is a single, virtual volume that uses solid-state disk (SSD) drives. The data in the cluster volume is represented as a single logical volume to all the DB instances in the cluster.

The virtual volume contains disk-volume copies of the DB cluster data in multiple Availability Zones within a single AWS Region. Neptune uses quorum writes that make six copies of your data across disk-volumes in three Availability Zones. This ensures that storage of your data in Neptune is highly durable, with little very low likelihood of data loss. The data is replicated automatically across the Availability Zones regardless of whether there are DB instances in them, and the amount of replication is independent of the number of DB instances in your cluster.

Neptune cluster volumes automatically grow as the amount of data in your database increases. A Neptune cluster volume can grow to a maximum size of 64 terabytes (TiB).

Graph size is limited to the size of the cluster volume. That is, the maximum graph size in a Neptune DB cluster is also 64 TiB.

Neptune Storage Auto-Repair

Neptune also automatically detects failures in the disk volumes that make up the virtual cluster volume. When a segment of a disk volume fails, Neptune immediately repairs that segment, using data in other disk volumes in the virtual cluster volume to ensure that the data in the repaired segment is current.

As a result, Neptune can avoid most data loss, which reduces the need to perform frequent point-in-time restores to recover from disk failure.

How Neptune Storage is Billed

Even though a Neptune cluster volume can grow to up to 64 TiB, you are only charged for the space actually allocated. However, when Neptune data is removed, such as by using a drop query like g.V().drop(), the overall allocated space remains the same. Unused allocated space is then reused automatically when the amount of data increases in the future.

Because storage costs are based on the storage "high water mark" (the maximum amount allocated to your Neptune DB cluster at any time during its existence), try to avoid ETL practices that create large amounts of temporary information, or that load large amounts of new data prior to removing unneeded older data.

You can determine what the "high water mark" is currently for your Neptune DB cluster by monitoring the VolumeBytesUsed CloudWatch metric (see Monitoring Neptune Using Amazon CloudWatch).

If a substantial amount of your allocated storage is not being used, the only way to re-set the high water mark is to export all the data in your graph and then reload it into a new DB cluster. Creating and restoring a snapshot does not reduce the amount allocated storage, because the physical layout of the underlying storage remains unchanged.

For more Neptune pricing information, see Amazon Neptune Pricing.