Database Architecture Selection
The optimal database solution for a system varies based on requirements for availability, consistency, partition tolerance, latency, durability, scalability, and query capability. Many systems use different database solutions for various sub-systems and enable different features to improve performance. Selecting the wrong database solution and features for a system can lead to lower performance efficiency.
Understand data characteristics: Understand the different characteristics of data in your workload. Determine if the workload requires transactions, how it interacts with data, and what its performance demands are. Use this data to select the best performing database approach for your workload (for example, relational databases, NoSQL Key-value, document, wide column, graph, time series, or in-memory storage).
You can choose from many purpose-built database engines including relational, key-value, document, in-memory, graph, time series, and ledger databases. By picking the best database to solve a specific problem (or a group of problems), you can break away from restrictive one-size-fits-all monolithic databases and focus on building applications to meet the needs of your customers.
Relational databases store data with predefined schemas and
relationships between them. These databases are designed to
support ACID (atomicity, consistency, isolation, durability)
transactions, and maintain referential integrity and strong data
consistency. Many traditional applications, enterprise resource
planning (ERP), customer relationship management (CRM), and
e-commerce use relational databases to store their data. You can
run many of these database engines on Amazon EC2, or choose from
one of the AWS
managed
database services
Key-value databases are optimized for common access patterns, typically to store and retrieve large volumes of data. These databases deliver quick response times, even in extreme volumes of concurrent requests.
High-traffic web apps, e-commerce systems, and gaming applications
are typical use-cases for key-value databases. In AWS, you can
utilize Amazon DynamoDB
In-memory databases are used for applications that require
real-time access to data. By storing data directly in memory,
these databases deliver microsecond latency to applications for
whom millisecond latency is not enough. You may use in-memory
databases for application caching, session management, gaming
leaderboards, and geospatial applications.
Amazon ElastiCache
A document database is designed to store semi structured data as
JSON-like documents. These databases help developers build and
update applications such as content management, catalogs, and user
profiles quickly.
Amazon
DocumentDB
A wide column store is a type of NoSQL database. It uses tables, rows, and columns, but
unlike a relational database, the names and format of the columns can vary from row to row in
the same table. You typically see a wide column store in high scale industrial apps for
equipment maintenance, fleet management, and route optimization. Amazon Keyspaces (for Apache Cassandra)
Graph databases are for applications that must navigate and query
millions of relationships between highly connected graph datasets
with millisecond latency at large scale. Many companies use graph
databases for fraud detection, social networking, and
recommendation engines.
Amazon
Neptune
Time-series databases efficiently collect, synthesize, and derive
insights from data that changes over time. IoT applications,
DevOps, and industrial telemetry can utilize time-series
databases.
Amazon
Timestream
Ledger databases provide a centralized and trusted authority to
maintain a scalable, immutable, and cryptographically verifiable
record of transactions for every application. We see ledger
databases used for systems of record, supply chain, registrations,
and even banking transactions.
Amazon Quantum
Ledger Database (QLDB)
Evaluate the available options: Evaluate the services and storage options that are available as part of the selection process for your workload's storage mechanisms. Understand how, and when, to use a given service or system for data storage. Learn about available configuration options that can optimize database performance or efficiency, such as provisioned IOPs, memory and compute resources, and caching.
Database solutions generally have configuration options that allow you to optimize for the type of workload. Using benchmarking or load testing, identify database metrics that matter for your workload. Consider the configuration options for your selected database approach such as storage optimization, database level settings, memory, and cache.
Evaluate database caching options for your workload. The three most common types of database caches are the following:
-
Database integrated caches: Some databases (such as Amazon Aurora) offer an integrated cache that is managed within the database engine and has built-in write-through capabilities.
-
Local caches: A local cache stores your frequently used data within your application. This speeds up your data retrieval and removes network traffic associated with retrieving data, making data retrieval faster than other caching architectures.
-
Remote caches: Remote caches are stored on dedicated servers and typically built upon key/value NoSQL stores such as Redis and Memcached. They provide up to a million requests per second per cache node.
For Amazon DynamodDB workloads,
DynamoDB
Accelerator (DAX)
Collect and record database performance metrics: Use tools, libraries, and systems that record performance measurements related to database performance. For example, measure transactions per second, slow queries, or system latency introduced when accessing the database. Use this data to understand the performance of your database systems.
Instrument as many database activity metrics as you can gather
from your workload. These metrics may need to be published
directly from the workload or gathered from an application
performance management service. You can use
AWS X-Ray
Choose data storage based on access patterns: Use the access patterns of the workload to decide which services and technologies to use. For example, utilize a relational database for workloads that require transactions, or a key-value store that provides higher throughput but is eventually consistent where applicable.
Optimize data storage based on access patterns and metrics: Use performance characteristics and access patterns that optimize how data is stored or queried to achieve the best possible performance. Measure how optimizations such as indexing, key distribution, data warehouse design, or caching strategies impact system performance or overall efficiency.