PERF04-BP04 Choose data storage based on access patterns - Performance Efficiency Pillar

PERF04-BP04 Choose data storage based on access patterns

Use the access patterns of the workload and requirements of the applications to decide on optimal data services and technologies to use.

Desired outcome: Data storage has been selected based on identified and documented data access patterns. This might include the most common read, write, and delete queries, the need for as necessary calculations and aggregations, complexity of the data, data interdependency, and the required consistency needs.

Common anti-patterns:

  • You only select one database engine to simplify operations management.

  • You assume that data access patterns will stay consistent over time.

  • You implement complex transactions, rollback, and consistency logic in the application.

  • The database is configured to support a potential high traffic burst, which results in the database resources remaining idle most of the time.

  • Using a shared database for transactional and analytical uses.

Benefits of establishing this best practice: Selecting and optimizing your data storage based on access patterns will help decrease development complexity and optimize your performance opportunities. Understanding when to use read replicas, global tables, data partitioning, and caching will help you decrease operational overhead and scale based on your workload needs.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

Identify and evaluate your data access pattern to select the correct storage configuration. Each database solution has options to configure and optimize your storage solution. Use the collected metrics and logs and experiment with options to find the optimal configuration. Use the following table to review storage options per database service.

AWS Services Amazon RDS Amazon Aurora Amazon DynamoDB Amazon DocumentDB Amazon ElastiCache Amazon Neptune Amazon Timestream Amazon Keyspaces Amazon QLDB
Scaling Storage Storage can be scaled up manually or configured to scale automatically to a maximum of 64 TiB based on engine types. Provisioned storage cannot be decreased. Storage scales automatically up to maximum of 128 TiB and decreases when data is removed. Maximum storage size also depends upon specific Aurora MySQL or Aurora Postgres engine versions. Storage automatically scales. Tables are unconstrained in terms of size. Storage scales automatically up to maximum of 64 TiB. Starting Amazon DocumentDB 4.0 storage can decrease by comparable amounts for data removal through dropping a collection or index. With Amazon DocumentDB 3.6 allocated space remains same and free space is reused when data volume increases. Storage is in-memory, tied to instance type or count. Storage scales automatically can grow up to 128 TiB (or 64 TiB in few Regions). Upon data removal from, total allocated space remains same and is reused in the future. Organizes your time series data to optimize query processing and reduce storage costs. Retention period can be configured through in-memory and magnetic tiers. Scales table storage up and down automatically as your application writes, updates, and deletes data. Storage automatically scales. Tables are unconstrained in terms of size.

Implementation steps:

  1. Understand the requirement of transactions, atomicity, consistency, isolation, and durability (ACID) compliance, and consistent reads. Not every database supports these and most of the NoSQL databases provide an eventual consistency model.

  2. Consider the traffic patterns, latency, and access requirements for a globally distributed application in order to identify the optimal storage solution.

  3. Analyze query patterns, random access patterns and one-time queries. Considerations around highly specialized query functionality for text and natural language processing, time series, and graphs must also be taken into account.

  4. Identify and document the anticipated growth of the data and traffic.

    1. Amazon RDS and Aurora support storage automatic scaling up to documented limits. Beyond this, consider transitioning older data to Amazon S3 for archival, aggregating historical data for analytics or scaling horizontally using sharding.

    2. DynamoDB and Amazon S3 will scale to near limitless storage volume automatically.

    3. Amazon RDS instances and databases running on EC2 can be manually resized and EC2 instances can have new EBS volumes added at a later date for additional storage. 

    4. Instance types can be changed based on changes in activity. For example, you can start with a smaller instance while you are testing, then scale the instance as you begin to receive production traffic to the service. Aurora Serverless V2 automatically scales in response to changes in load. 

  5. Baseline requirements around normal and peak performance (transactions per second TPS and queries per second QPS) and consistency (ACID and eventual consistency).

  6. Document solution deployment aspects and the database access requirements (like global replication, Multi-AZ, read replication, and multiple write nodes).

Level of effort for the implementation plan:  Low. If you do not have logs or metrics for your data management solution, you will need to complete that before identifying and documenting your data access patterns. Once your data access pattern is understood, selecting and configuring your data storage is a low level of effort.


Related documents:

Related videos:

Related examples: