Implement cross-Region disaster recovery with AWS DMS and Amazon Aurora
Created by Mark Hudson (AWS)
Environment: Production | Technologies: Databases | AWS services: AWS DMS; Amazon RDS; Amazon Aurora |
Summary
Natural or human-induced disasters can occur at any time and can impact the availability of services and workloads running in a given Amazon Web Services (AWS) Region. To mitigate the risks, you must develop a disaster recovery (DR) plan that incorporates the built-in cross-Region capabilities of AWS services. For AWS services that do not inherently provide cross-Region functionality, the DR plan must also provide a solution to handle their failover across AWS Regions.
This pattern guides you through a disaster recovery setup involving two Amazon Aurora MySQL-Compatible Edition database clusters in a single Region. To meet DR requirements, the database clusters are configured to use the Amazon Aurora global database feature, with a single database spanning multiple AWS Regions. An AWS Database Migration Service (AWS DMS) task replicates data between the clusters in the local Region. AWS DMS, however, currently doesn’t support task failover between Regions. This pattern includes the steps required to work around that limitation and independently configure AWS DMS in both Regions.
Prerequisites and limitations
Prerequisites
Selected primary and secondary AWS Regions that support Amazon Aurora global databases.
Two independent Amazon Aurora MySQL-Compatible Edition database clusters in a single account in the primary Region.
Database instance class db.r5 or higher (recommended).
An AWS DMS task in the primary Region performing ongoing replication between the existing database clusters.
DR Region resources in place to meet requirements for creating database instances. For more information, see Working with a DB instance in a VPC.
Limitations
For the full list of Amazon Aurora global database limitations, see Limitations of Amazon Aurora global databases.
Product versions
Amazon Aurora MySQL-Compatible Edition 5.7 or 8.0. For more information, see Amazon Aurora versions.
Architecture
Target technology stack
Amazon Aurora MySQL-Compatible Edition global database cluster
AWS DMS
Target architecture
The following diagram shows a global database for two AWS Regions, one with the primary main and reporter databases and AWS DMS replication, and one with the secondary main and reporter databases.
Automation and scale
You can use AWS CloudFormation to create the prerequisite infrastructure in the secondary Region, such as the virtual private cloud (VPC), subnets, and parameter groups. You can also use AWS CloudFormation to create the secondary clusters in the DR Region and add them to the global database. If you used CloudFormation templates to create the database clusters in the primary Region, you can update or augment them with an additional template to create the global database resource. For more information, see Creating an Amazon Aurora DB cluster with two DB instances and Creating a global database cluster for Aurora MySQL.
Finally, you can create the AWS DMS tasks in the primary and secondary Regions using CloudFormation after failover and failback events occur. For more information, see AWS::DMS::ReplicationTask
Tools
Amazon Aurora - Amazon Aurora is a fully managed relational database engine that's compatible with MySQL and PostgreSQL. This pattern uses Amazon Aurora MySQL-Compatible Edition.
Amazon Aurora global databases - Amazon Aurora global databases are designed for globally distributed applications. A single Amazon Aurora global database can span multiple AWS Regions. It replicates your data with no impact on database performance. It also enables fast local reads with low latency in each Region, and it provides disaster recovery from Region-wide outages.
AWS DMS - AWS Database Migration Service (AWS DMS) provides one-time migration or on-going replication. An on-going replication task keeps your source and target databases in sync. After it is set up, the on-going replication task continuously applies source changes to the target with minimal latency. All AWS DMS features, such as data validation and transformations, are available for any replication task.
Epics
Task | Description | Skills required |
---|---|---|
Modify the database cluster parameter group. | In the existing database cluster parameter group, activate row-level binary logging by setting the AWS DMS requires row-level binary logging for MYSQL-compatible databases when performing ongoing replication or change data capture (CDC). For more information, see Using an AWS managed MySQL-compatible database as a source for AWS DMS. | AWS administrator |
Update the database binary log retention period. | Using a MySQL client installed on your end-user device or an Amazon Elastic Compute Cloud (Amazon EC2) instance, run the following stored procedure provided by Amazon Relational Database Service (Amazon RDS) on the main database cluster's writer node, where
Confirm the setting by running the following command.
MySQL-compatible databases managed by AWS purge the binary logs as soon as possible. Therefore, the retention period must be long enough to ensure that the logs are not purged before the AWS DMS task runs. A value of 24 hours is usually sufficient, but the value should be based on the time required to set up the AWS DMS task in the DR Region. | DBA |
Task | Description | Skills required |
---|---|---|
Record the AWS DMS task ARN. | Use the Amazon Resource Name (ARN) to obtain the AWS DMS task name for later use. To retrieve the AWS DMS task ARN, view the task in the console or run the following command.
An ARN looks like the following.
The characters after the last colon correspond to the task name used in a later step. | AWS administrator |
Modify the existing AWS DMS task to record the checkpoint. | AWS DMS creates checkpoints that contain information so that the replication engine knows the recovery point for the change stream. To record checkpoint information, perform the following steps in the console:
| AWS administrator |
Validate checkpoint information. | Using a MySQL client connected to the writer endpoint for the cluster, query the new metadata table in the reporter database cluster to verify that it exists and contains the replication state information. Run the following command.
The task name from the ARN should be found in this table in the | DBA |
Task | Description | Skills required |
---|---|---|
Create base infrastructure in the DR Region. | Create the base components required for the creation of and access to the Amazon Aurora clusters:
Ensure that the configuration of both parameter groups matches the configuration in the primary Region. | AWS administrator |
Add the DR Region to both Amazon Aurora clusters. | Add a secondary Region (the DR Region) to the main and reporter Amazon Aurora clusters. For more information, see Adding an AWS Region to an Amazon Aurora global database. | AWS administrator |
Task | Description | Skills required |
---|---|---|
Stop the AWS DMS task. | The AWS DMS task in the primary Region will not function properly after failover occurs and should be stopped to avoid errors. | AWS administrator |
Perform a managed failover. | Perform a managed failover of the main database cluster to the DR Region. For instructions, see Performing managed planned failovers for Amazon Aurora global databases. After failover on the main database cluster is complete, perform the same activity on the reporter database cluster. | AWS administrator, DBA |
Load data into the main database. | Insert test data into writer node of the main database in the DR database cluster. This data will be used to validate that replication is functioning properly. | DBA |
Create the AWS DMS replication instance. | To create the AWS DMS replication instance in the DR Region, see Creating a replication instance. | AWS administrator, DBA |
Create the AWS DMS source and target endpoints. | To create the AWS DMS source and target endpoints in the DR Region, see Creating source and target endpoints. The source should point to the writer instance of the main database cluster. The target should point to the writer instance of the reporter database cluster. | AWS administrator, DBA |
Obtain the replication checkpoint. | To obtain the replication checkpoint, use a MySQL client to query the metadata table by running the following against the writer node in the reporter database cluster in the DR Region.
In the table, find the task_name value that corresponds to the AWS DMS task’s ARN that exists in the primary Region that you obtained in the second epic. | DBA |
Create an AWS DMS task. | Using the console, create an AWS DMS task in the DR Region. In the task, specify a migration method of Replicate data changes only. For more information, see Creating a task.
Set the AWS DMS task Start migration task setting to Automatically on create. | AWS administrator, DBA |
Record the AWS DMS task ARN. | Use the ARN to obtain the AWS DMS task name for later use. To retrieve the AWS DMS task ARN, run the following command.
| AWS administrator, DBA |
Validate the replicated data. | Query the reporter database cluster in the DR Region to confirm that the test data that you loaded into the main database cluster has been replicated. | DBA |
Task | Description | Skills required |
---|---|---|
Stop the AWS DMS task. | The AWS DMS task in the DR Region will not function properly after failback occurs and should be stopped to avoid errors. | AWS administrator |
Perform a managed failback. | Fail back the main database cluster to the primary Region. For instructions, see Performing managed planned failovers for Amazon Aurora global databases. After the failback on the main database cluster is complete, perform the same activity on the reporter database cluster. | AWS administrator, DBA |
Obtain the replication checkpoint. | To obtain the replication checkpoint, use a MySQL client to query the metadata table by running the following against the writer node in the reporter database cluster in the DR Region.
In the table, find the | DBA |
Update the AWS DMS source and target endpoints. | After the database clusters have failed back, check the clusters in the primary Region to determine which nodes are the writer instances. Then verify the existing AWS DMS source and target endpoints in the primary Region are pointing to the writer instances. If not, update the endpoints with the writer instance Domain Name System (DNS) names. | AWS administrator |
Create an AWS DMS task. | Using the console, create an AWS DMS task in the primary Region. In the task, specify a migration method of Replicate data changes only. For more information, see Creating a task.
| AWS administrator, DBA |
Record the AWS DMS task Amazon Resource Name (ARN). | Use the ARN to obtain the AWS DMS task name for later use. To retrieve the AWS DMS task ARN, run the following command:
The task name will be needed when performing another managed failover or during a DR scenario. | AWS administrator, DBA |
Delete AWS DMS tasks. | Delete the original (currently stopped) AWS DMS task in the primary Region and the existing AWS DMS task (currently stopped) in the secondary Region. | AWS administrator |
Related resources
Additional information
Amazon Aurora global databases are used in this example for DR because they provide an effective recovery time objective (RTO) of 1 second and a recovery point objective (RPO) of less than 1 minute, both lower than traditional replicated solutions and ideal for DR scenarios.
Amazon Aurora global databases offer many other advantages, including the following:
Global reads with local latency – Global consumers can access information in a local Region, with local latency.
Scalable secondary Amazon Aurora DB clusters – Secondary clusters can be scaled independently, adding up to 16 read-only replicas.
Fast replication from primary to secondary Amazon Aurora DB clusters – Replication has little performance impact on the primary cluster. It occurs at the storage layer, with typical cross-Region replication latencies of less than 1 second.
This pattern also uses AWS DMS for replication. Amazon Aurora databases provide the ability to create read replicas, which can simplify the replication process and the DR setup. However, AWS DMS is often used to replicate when data transformations are required or when the target database requires additional indexes that the source database does not have.