Best practices for modeling relational data in DynamoDB - Amazon DynamoDB

Best practices for modeling relational data in DynamoDB

Traditional relational database management systems (RDBMS) store data in a normalized relational structure. The objective of the relational data model is to reduce the duplication of data (through normalization) to support referential integrity and reduce data anomalies. The following schema is an example of a relational data model for a generic order-entry application that supports a human resources schema backing the operational and business support systems of a theoretical manufacturer.

      Example RDBMS schema.

How DynamoDB eliminates the need for JOIN operations

A RDBMS uses a structure query language (SQL) to return data to the application. Due to the normalization of the data model, such queries typically require the use of the JOIN operator to combine data from one or more tables.

For example, to generate a list of purchase order items sorted by the quantity in stock at all warehouses that can ship each item, you could issue the following SQL query against the preceding schema.

SELECT * FROM Orders INNER JOIN Order_Items ON Orders.Order_ID = Order_Items.Order_ID INNER JOIN Products ON Products.Product_ID = Order_Items.Product_ID INNER JOIN Inventories ON Products.Product_ID = Inventories.Product_ID ORDER BY Quantity_on_Hand DESC

SQL queries of this kind can provide a flexible API for accessing data, but they require a significant amount of processing. Each join in the query increases the runtime complexity of the query as the data for each table must stage and then be assembled to return the result set. Additional factors that can impact how long it takes the queries to execute are the size of the tables and whether the columns being joined have indexes. The preceding query initiates complex queries across several tables and then sorts the result set.

Eliminating the need for JOINs is at the heart of NoSQL data modeling. This is why we built DynamoDB to support, and why DynamoDB can deliver consistent performance at any scale. Given the runtime complexity of SQL queries and JOINs, RBDMS performance is not constant at scale, which causes performance issues as customer applications grow.

While normalizing data does reduce the amount of data stored to disk, often the most constrained resources that impact performance are CPU time and network latency. DynamoDB is built to minimize both constraints by eliminating JOINs (and encouraging denormalization of data) and optimizing the database architecture to fully answer an application query with a single request to an item. These qualities enable DynamoDB to provide single digit millisecond performance at any scale as the runtime complexity for DynamoDB operations is constant, regardless of data size, for common access patterns.

How DynamoDB transactions eliminate overhead to the write process

Another factor that can slow down a RDBMS is the use of transactions to write to a normalized schema. As shown in the example, relational data structures used by most online transaction processing (OLTP) applications must be broken down and distributed across multiple logical tables when they are stored in an RDBMS. Therefore, an ACID-compliant transaction framework is necessary to avoid race conditions and data integrity issues that could occur if an application tries to read an object that is in the process of being written. Such a transaction framework, when coupled with a relational schema, can add significant overhead to the write process.

The implementation of transactions in DynamoDB prohibits common scaling issues found with a RDBMS by issuing a transaction as a single API call and bounding the number of items that can be accessed in that single transaction. Long-running transactions can cause operational issues by holding locks on the data either for a long time, or perpetually, as the transaction is never closed. To prevent such issues in DynamoDB, transactions were implemented with two distinct APIs: TransactWriteItems and TransactGetItems. These APIs do not have begin and end semantics that are common in a RDBMS. Further, DynamoDB has a 100 item access limit within a transaction to similarly prevent long-running transactions. To learn more about DynamoDB transactions, see Working with Transactions.

For these reasons, when your business requires low-latency response to high-traffic queries, taking advantage of a NoSQL system generally makes technical and economic sense. Amazon DynamoDB helps solve the problems that limit relational system scalability by avoiding them.

The performance of a RDBMS does not typically scale well for the following reasons:

  • It uses expensive joins to reassemble required views of query results.

  • It normalizes data and stores it on multiple tables that require multiple queries to write to disk.

  • It generally incurs the performance costs of an ACID-compliant transaction system.

DynamoDB scales well for these reasons:

  • Schema flexibility lets DynamoDB store complex hierarchical data within a single item.

  • Composite key design lets it store related items close together on the same table.

  • Transactions are performed in a single operation and limits the number of items that can be accessed to 100 to avoid long-running operations

Queries against the data store become much simpler, often in the following form:

SELECT * FROM Table_X WHERE Attribute_Y = "somevalue"

DynamoDB does far less work to return the requested data compared to the RDBMS in the earlier example.