Cost optimization pillar
The cost optimization pillar of the AWS Well-Architected Framework focuses on avoiding unnecessary costs. The following recommendations can help you meet the cost optimization design principles and architectural best practices for Amazon Neptune.
The cost optimization pillar focuses on the following key areas:
-
Understanding spending over time and controlling fund allocation
-
Selecting resources of the right type and quantity
-
Scaling to meet business needs without overspending
Understand usage patterns and services needed
Neptune is a good fit for your workload if your data model has a discernible graph structure, and your queries need to explore relationships and traverse multiple hops. A graph database isn't a good fit for the following patterns:
-
Mainly single-hop queries (consider whether your data might be better represented as attributes of an object)
-
JSON or BLOB data stored as properties
-
Queries that aggregate across a dataset, such as calculating the sum of a numeric property across a large number of nodes
Consider whether using several purpose-built databases together for specific access patterns might address all of your needs. For example:
-
An API that requires less frequent complex graph navigations alongside highly concurrent retrieval of properties for a single node might be best presented by using one or more of Neptune, DynamoDB, or Amazon DocumentDB.
-
Relational databases can co-exist with Neptune to maintain your existing functionality, but use Neptune only for multiple-hop traversals that do not perform and scale well in relational databases.
Understand the costs associated with services that interact with and complement Neptune, including the following:
-
Amazon Simple Storage Service (Amazon S3) storage costs for data files being bulk loaded into Neptune
-
Lambda functions used for insert or upsert queries, read queries, and Neptune streams processing
-
The API layer built on Neptune to interact with the client application (instead of having direct connections to the database) in or AWS AppSync
-
AWS Glue jobs used to transfer data to and from Neptune
-
Amazon Kinesis or Amazon Managed Streaming for Apache Kafka (Amazon MSK) instances receiving streaming data for near real-time ingestion into Neptune.
-
AWS Database Migration Service for migration of relational data to Neptune
-
Amazon SageMaker Runtime costs for Jupyter notebooks and deep graph library machine learning models
Select resources with attention to cost
Neptune pricing
-
Does the
MainRequestQueuePendingRequests
CloudWatch metric stay at a consistently low number near zero? -
Does the
BufferCacheHitRatio
CloudWatch metric stay at or above 99.9 percent a majority of the time? -
What are the cost and performance curves for instance costs and for associated data I/O costs? Data read costs might increase significantly with an undersized instance that requires frequent buffer cache swapping with storage.
BufferCacheHitRatio
will be dropping frequently in these scenarios.
Instance costs scale linearly with size within the same instance family. The
hourly cost of the db.r6i.2xlarge
instance is twice that of the
db.r6i.xlarge
instance and also has twice the resource allocation.
The db.r6i.24xlarge
instance is 24 times the hourly cost of the
db.r6i.xlarge
instance.
Estimate the number of concurrent queries you must support. You can have between
zero and fifteen read replicas for processing read-only queries. If your
requirements vary by the time of day, week, or month, you can use multiple smaller
instances to scale on a schedule. Each vCPU on an instance provides two threads for
handling concurrent queries. Three db.r6i.xlarge
read replicas, with 4
vCPU each, can handle 24 concurrent queries..
If your traffic volume is instead measured in queries per second (QPS), you must
experiment to determine the average latency of your queries. The number of queries
per second a Neptune cluster can support is equal to vCPU × 2 ×
(1 second/average query latency)
. For example, if you have 4 vCPU and
query latency of 100 milliseconds (0.1 second), QPS = 4 × 2 ×
(1s/0.1s) = 80 queries per second
.
Provisioned instances are cheaper than serverless for continuous, stable, and
predictable workloads. Serverless provides opportunities for optimizing costs when
you have a workload that requires very high usage for just a few hours per day (for
example, db.r6i.4xlarge
) and then almost no traffic for the remainder
of the day (for example, 1 Neptune Compute Unit). A serverless instance that scales
up for a few hours and then back down will be less expensive than using a
provisioned db.r6i.4xlarge
instance all day.
Choose the best Neptune instance configuration for your workload
For entry-level experimentation with Neptune, you can use the AWS Free Tierdb.t3.medium
and db.t4g.medium
instance
usage are enough for you to get a good understanding of Neptune at low scale. Your
cluster will remain after the free trial period ends, although you will be charged
for usage going forward from that point.
The db.t3.medium
and db.t4g.medium
instances are good
for low-cost development environments, but be aware that they have a smaller RAM to
vCPU ratio (2:1) than the R family instances (8:1) or X family instances (16:1).
Performance profiles might differ from those classes, especially regarding
OutOfMemoryExceptions
and when queries navigate across a
significant portion of the graph. To determine whether the latter condition might be
affected, check the BufferCacheHitRatio
CloudWatch metric.
We strongly advise against doing any performance or load testing with T family instances because you might experience inconsistent results that are not indicative of a production environment.
Provisioned instances give you the best cost and performance combination when your
workload is fairly stable and predictable. Choose the instance size based on the
request concurrency required and the query complexity. Higher concurrency requires
more vCPUs. Higher query complexity requires more RAM. Use the
MainRequestQueuePendingRequests
CloudWatch metric to determine the impact
of the former (greater than zero represents more concurrent requests than can be
handled). Use the BufferCacheHitRatio
CloudWatch metric to determine the
impact of the latter. A ratio that is frequently falling lower than 99.9 percent
suggests that there isn't enough RAM to contain the working portion of the
graph being evaluated, which results in more frequent cache swapping. If the R
family of instances provides sufficient concurrency but not enough RAM, consider
trying the X family of instances.
Ideal use cases for serverless instances are described in the Neptune documentation. If you are unsure whether provisioned or
serverless is best for you, and cost is your primary concern, test your workload in
serverless to determine the number of NCUs used and compare the cost of provisioned
(N hours × hourly provisioned cost
) with serverless
(sum of NCUs × hourly cost per NCU
). If you are unsure about
the equivalent sized provision instance, one NCU is equivalent to approximately 2 GB
of RAM and associated vCPU and networking. If your provisioned instance is from the
R6i family, the ratio is 1 vCPU per 8 GB of RAM, or 4 NCUs, along with associated
networking.
When using serverless for primary and replica instances, remember that read replicas in promotion tiers 0 and 1 will scale their NCUs in line with the writer instance so that they are properly scaled if a failover event occurs. Set your NCU limits for these instances based on which of your instances—writer or readers—receive the most traffic.
In environments where the cluster is not needed 24 hours per day, 7 days a week, consider writing scripts that will turn off the Neptune instances when not in use and start them again before they are used. Neptune instances will automatically restart every 7 days to ensure required maintenance updates are applied. If you intend to leave the instances off for long durations, use a weekly script to shut them down again.
Right-size data storage and transfer
More efficient queries (for example, queries that need to touch fewer nodes, edges, and properties in the graph) require less I/O transfer and potentially can make use of smaller instances because less buffer cache is required. Use the profile or explain endpoints for your query language to optimize your query, and consider optimizing your graph model for your query performance.
Neptune uses dictionary encoding on large strings, and that dictionary is optimized for performance, not efficiency. If you have large BLOBs, JSON, or frequently changing strings for properties, consider storing them outside Neptune in Amazon S3, Amazon DynamoDB, or Amazon DocumentDB, and store only a reference within the Neptune node.
In some cases, choosing a larger instance size can be cheaper. If your I/O costs
are very high because of a low BufferCacheHitRatio
, it's possible
that the larger buffer cache would significantly reduce that cost. That's
because all of the data would fit in the cache instead of being frequently swapped
from storage and incurring the I/O transfer rate.
Neptune uses copy-on-write cloning. When cloning to split a graph into multiple shards, it might be more efficient not to delete the unwanted data on the cloned cluster because that will involve the creation of new data pages, resulting in increased storage costs. Data that is unchanged from before the cloning event will exist in a single data page shared across the two clusters and will be charged only for that single copy.
Do not enable the OSGP index or use R5d instances unless you have tested to confirm that they make a substantial difference in your workload. Both are designed for rarely occurring scenarios, and they might increase your costs for minimal or no gains.