Abstract and introduction - Best Practices for Migrating from RDBMS to Amazon DynamoDB

Abstract and introduction

Publication date: February 28, 2022 (Document history)

Abstract

Software architects and developers have an array of choices for data storage and persistence. These include not only traditional relational database management systems (RDBMS), but also NoSQL databases, such as Amazon DynamoDB. Certain workloads will scale better and be more cost-effective to run using a NoSQL solution. This whitepaper highlights the best practices for migrating these workloads from an RDBMS to DynamoDB. It also discusses how NoSQL databases like DynamoDB differ from a traditional RDBMS, and proposes a framework for analysis, data modeling, and migration of data from an RDBMS into DynamoDB.

Introduction

For decades, the RDBMS was the de facto choice for data storage and persistence. Any data driven application, be it an e-commerce website or an expense reporting system, was almost certain to use a relational database to retrieve and store the data required by the application. The reasons for this are numerous and include the following:

  • RDBMS is a mature and stable technology.

  • The query language, SQL, is feature-rich and versatile.

  • The servers that run an RDBMS engine are typically some of the most stable and powerful in the IT infrastructure.

  • All major programming languages contain support for the drivers used to communicate with an RDBMS, as well as a rich set of tools for simplifying the development of database-driven applications.

These factors, and many others, have supported the wide adoption of the RDBMS. For architects and software developers, there simply wasn’t a reasonable alternative for data storage and persistence – until now.

The growth of internet scale web applications, such as e-commerce and social media, the explosion of connected devices like smart phones and tablets, and the rise of big data have resulted in new workloads that traditional relational databases are not well suited to handle. As systems designed for transaction processing, all RDBMS must support certain fundamental properties. These properties are defined by the acronym ACID: Atomicity, Consistency, Isolation, and Durability. Atomicity refers to all or nothing operations – a transaction processes completely or not at all. Consistency means that the process of a transaction causes a valid state transition or the transaction is cancelled. Once the transaction is committed, the state of the resulting data must conform to the constraints imposed by the database schema. Isolation requires that concurrent transactions run separately from one another. The isolation property guarantees that if concurrent transactions are run in serial, the end state of the data will be the same. Durability requires that the state of the data, once a transaction processes, be preserved. In the event of power or system failure, the database must be able to recover to the last known state.

These ACID properties are all desirable, but support for all four requires an architecture that poses some challenges for today’s data intensive workloads. For example, consistency requires a well-defined schema and that all data stored in a database conform to that schema. This is great for ad-hoc queries and read-heavy workloads. For a workload consisting almost entirely of writes, such as the saving of a player’s state in a gaming application, this enforcement of schema is expensive from a storage and compute standpoint. The game developer benefits little by forcing this data into rows and tables that relate to one another through a well-defined set of keys.

Consistency also requires locking some portion of the data until the transaction modifying it completes and then making the change immediately visible. For a bank transaction, which debits one account and credits another, this is required. This type of transaction is called strongly consistent. For a social media application, on the other hand, there really is no requirement that all users see an update to a data feed at precisely the same time. In this latter case, the transaction is eventually consistent. It is far more important that the social media application scale to handle potentially millions of simultaneous users even if those users see changes to the data at different times. Scaling an RDBMS to handle this level of concurrency, while maintaining strong consistency, requires upgrading to more powerful (and often proprietary) hardware. This is called scaling up or vertical scaling, it usually carries an extremely high cost and has an upper scalability limit. The more cost-effective way to scale a database to support this level of concurrency is to add server instances running on commodity hardware. This is called scaling out or horizontal scaling and it is typically far more cost-effective than vertical scaling.

NoSQL databases, such as Amazon DynamoDB, address the scaling and performance challenges found with RDBMS. The term NoSQL simply means that the database doesn’t follow the relational model espoused by E.F Codd in his 1970 paper A Relational Model of Data for Large Shared Data Banks, which would become the basis for all modern RDBMS. As a result, NoSQL databases vary much more widely in features and functionality than a traditional RDBMS. There is no common query language analogous to SQL, and query flexibility is generally replaced by high I/O performance and horizontal scalability. NoSQL databases don’t enforce the notion of schema in the same way as an RDBMS. NoSQL databases may store semi-structured data, like JSON, they may store related values as column sets, or they may simply store key/value pairs.

The net result is that NoSQL databases usually trade some of the query capabilities and ACID properties of an RDBMS for a much more flexible data model that scales horizontally. These characteristics make NoSQL databases an excellent choice in situations where use of an RDBMS (like the aforementioned game state example) is resulting in some combination of performance bottlenecks, operational complexity, and rising costs. DynamoDB offers solutions to all these problems, and is an excellent platform for migrating these workloads off of an RDBMS. In addition, DynamoDB supports strong consistency and ACID transactions, so even workloads that require such capabilities, which traditionally were not considered suitable for NoSQL databases, can take advantage of DynamoDB’s scalability, flexible data model, and operational simplicity.