Cross-Region Replication Using DynamoDB Streams
DynamoDB cross-region replication is a client-side solution for maintaining identical copies of DynamoDB tables across different AWS regions, in near real time. You can use cross-region replication to back up DynamoDB tables, or to provide low-latency access to data where users are geographically distributed.
Cross-region replication supports a single-master model. In this model, you create a master table with DynamoDB Streams enabled, and one or more replica tables. Each replica can reside in a different AWS region. When you modify data in the master table, the cross-region replication solution transparently updates all of the replicas, so that all of the tables in the replication group are kept in sync.
The replica tables are intended to serve as read-only copies of the data; however, it is possible to write data to a replica table. If you write data to a replica, those changes will not be propagated to the master, or to any other replicas. In addition, if you modify an item in the master table, that item will overwrite the same item in all of the replicas.
The following diagram provides an overview of a cross-region replication setup.
Cross-Region Replication Overview
You launch a preconfigured AWS CloudFormation stack. This is a one-time operation that takes approximately 20 minutes to complete.
The AWS CloudFormation stack uses AWS Elastic Beanstalk to launch the Replication Coordinator and DynamoDB Connector into the Amazon EC2 Container Service (Amazon ECS).
You use the Cross Region Replication Console to create a replication group.
The Replication Coordinator allocates all of the necessary resources, including metadata tables and replica tables in other regions. This operation can take up to 30 minutes to complete.
The DynamoDB Connector reads and processes incoming records from the stream on the master table in DynamoDB.
The DynamoDB Connector updates the replica table(s).
After the initial setup, the following components are used to perform the replication tasks.
Cross Region Replication Console—a standalone web application that resembles the AWS Management Console. This application allows you to:
Define a replication group, consisting of a master table and at least one replica. The master and the replica(s) must have the same key schema and attribute definition. A replica can be located in the same AWS region as the master, or in a different region.
Launch the replication group. The Cross Region Replication Console communicates with the Replication Coordinator to perform create, delete, and update operations for the replication group.
Monitor the performance of replication groups and their members.
Add and remove replicas from a replication group, or delete a replication group.
Replication Coordinator—an application running on Amazon EC2 Container Service that does the following:
Responds to user requests from the Cross Region Replication Console.
Manages replication states and physical copies of a table.
Creates and maintains the following DynamoDB tables:
One metadata table that keeps track of replication activity and enables recovery from failures.
One KCL checkpoint table for tracking Kinesis Client Library (KCL) processing.
An additional KCL checkpoint table per replication path. For example, if you have a replication group with three replicas, the Replication Coordinator will create three additional KCL checkpoint table. Each additional KCL checkpoint table is created in the same AWS region as its corresponding replica.
Amazon Web Services provides a prewritten AWS CloudFormation template that launches the Replication Coordinator on a customer-provided Amazon ECS cluster.
DynamoDB Connector—an application that contains the low-level processing logic for replication from a source table to a destination table. The DynamoDB Connector does the following:
Reads incoming DynamoDB Streams records for the source table, using the Kinesis Client Library (KCL) and the DynamoDB Streams Kinesis Adapter libraries.
Analyzes each stream record to determine whether that record is necessary for the current replication tasks.
Groups related stream records into batches, based on primary key values. Duplicate stream records are removed.
Issues asynchronous conditional write requests to the replica table, in parallel.