Automate cross-Region failover and failback by using DR Orchestrator Framework - AWS Prescriptive Guidance

Automate cross-Region failover and failback by using DR Orchestrator Framework

Created by Jitendra Kumar (AWS), Oliver Francis (AWS), and Pavithra Balasubramanian (AWS)

Code repository: aws-cross-region-dr-databases

Environment: Production

Technologies: Databases; Infrastructure; Migration; Modernization

AWS services: Amazon Aurora; AWS CloudFormation; Amazon ElastiCache; Amazon RDS; AWS Step Functions

Summary

This pattern describes how to use DR Orchestrator Framework to orchestrate and automate the manual, error-prone steps to perform disaster recovery across Amazon Web Services (AWS) Regions. The pattern covers the following databases:

  • Amazon Relational Database Service (Amazon RDS) for MySQL, Amazon RDS for PostgreSQL, or Amazon RDS for MariaDB

  • Amazon Aurora MySQL-Compatible Edition or Amazon Aurora PostgreSQL-Compatible Edition (using a centralized file)

  • Amazon ElastiCache (Redis OSS)

To demonstrate the functionality of DR Orchestrator Framework, you create two DB instances or clusters. The primary is in the AWS Region us-east-1, and the secondary is in us-west-2. To create these resources, you use the AWS CloudFormation templates in the App-Stack folder of the aws-cross-region-dr-databases GitHub repository.

Prerequisites and limitations

General prerequisites

Engine-specific prerequisites

  • Amazon Aurora – At least one Aurora global database must be available in two AWS Regions. You can use us-east-1 as the primary Region, and use us-west-2 as the secondary Region.

  • Amazon ElastiCache (Redis OSS) – An ElastiCache global datastore must be available in two AWS Regions. You can use us-east-1 as the primary Region, and use us-west-2 as the secondary Region.

Amazon RDS limitations

  • DR Orchestrator Framework doesn't check the replication lag before doing a failover or failback. Replication lag must be checked manually.

  • This solution has been tested using a primary database instance with one read replica. If you want to use more than one read replica, test the solution thoroughly before implementing it in a production environment.

Aurora limitations

  • Feature availability and support vary across specific versions of each database engine and across AWS Regions. For more information on feature and Region availability for cross-Region replication, see Cross-Region read replicas.

  • Aurora global databases have specific configuration requirements for supported Aurora DB instance classes and the maximum number of AWS Regions. For more information, see Configuration requirements of an Amazon Aurora global database.

  • This solution has been tested using a primary database instance with one read replica. If you want to use more than one read replica, test the solution thoroughly before implementing it in a production environment.

ElastiCache limitations

  • For information about Region availability for Global Datastore and ElastiCache configuration requirements, see Prerequisites and limitations in the ElastiCache documentation.

Amazon RDS product versions

Amazon RDS supports the following engine versions:

  • MySQL – Amazon RDS supports DB instances running the following versions of MySQL: MySQL 8.0 and MySQL 5.7

  • PostgreSQL – For information about supported versions of Amazon RDS for PostgreSQL, see Available PostgreSQL database versions.

  • MariaDB – Amazon RDS supports DB instances running the following versions of MariaDB:

    • MariaDB 10.11

    • MariaDB 10.6

    • MariaDB 10.5

Aurora product versions

ElastiCache (Redis OSS) product versions

Amazon ElastiCache (Redis OSS) supports the following Redis versions:

  • Redis 7.1 (enhanced)

  • Redis 7.0 (enhanced)

  • Redis 6.2 (enhanced)

  • Redis 6.0 (enhanced)

  • Redis 5.0.6 (enhanced)

For more information, see Supported ElastiCache (Redis OSS) versions.

Architecture

Amazon RDS architecture

The Amazon RDS architecture includes the following resources:

  • The primary Amazon RDS DB instance created in the primary Region (us-east-1) with read/write access for clients

  • An Amazon RDS read replica created in the secondary Region (us-west-2) with read-only access for clients

  • DR Orchestrator Framework deployed in both the primary and secondary Regions

Diagram of two-Region RDS architecture in a single AWS account.

The diagram shows the following:

  1. Asynchronous replication between the primary instance and the secondary instance

  2. Read/write access for clients in the primary Region

  3. Read-only access for clients in the secondary Region

Aurora architecture

The Amazon Aurora architecture includes the following resources:

  • The primary Aurora DB cluster created in the primary Region (us-east-1) with an active-writer endpoint

  • An Aurora DB cluster created in the secondary Region (us-west-2) with an inactive-writer endpoint

  • DR Orchestrator Framework deployed in both the primary and secondary Regions

Diagram of two-Region Aurora deployment in a single AWS account.

The diagram shows the following:

  1. Asynchronous replication between the primary cluster and the secondary cluster

  2. The primary DB cluster with an active-writer endpoint

  3. The secondary DB cluster with an inactive-writer endpoint

ElastiCache (Redis OSS) architecture

The Amazon ElastiCache (Redis OSS) architecture includes the following resources:

  • An ElastiCache (Redis OSS) global datastore created with two clusters:

    1. The primary cluster in the primary Region (us-east-1)

    2. The secondary cluster in the secondary Region (us-west-2)

  • An Amazon cross-Region link with TLS 1.2 encryption between the two clusters

  • DR Orchestrator Framework deployed in both primary and secondary Regions

Diagram of a two-Region ElastiCache deployment with Amazon cross-Region link.

Automation and scale

DR Orchestrator Framework is scalable and supports the failover or failback of more than one AWS database in parallel.

You can use the following payload code to fail over multiple AWS databases in your account. In this example, three AWS databases (two global databases such as Aurora MySQL-Compatible or Aurora PostgreSQL-Compatible, and one Amazon RDS for MySQL instance) fail over to the DR Region:

{ "StatePayload": [ { "layer": 1, "resources": [ { "resourceType": "PlannedFailoverAurora", "resourceName": "Switchover (planned failover) of Amazon Aurora global databases (MySQL)", "parameters": { "GlobalClusterIdentifier": "!Import dr-globaldb-cluster-mysql-global-identifier", "DBClusterIdentifier": "!Import dr-globaldb-cluster-mysql-cluster-identifier" } }, { "resourceType": "PlannedFailoverAurora", "resourceName": "Switchover (planned failover) of Amazon Aurora global databases (PostgreSQL)", "parameters": { "GlobalClusterIdentifier": "!Import dr-globaldb-cluster-postgres-global-identifier", "DBClusterIdentifier": "!Import dr-globaldb-cluster-postgres-cluster-identifier" } }, { "resourceType": "PromoteRDSReadReplica", "resourceName": "Promote RDS for MySQL Read Replica", "parameters": { "RDSInstanceIdentifier": "!Import rds-mysql-instance-identifier", "TargetClusterIdentifier": "!Import rds-mysql-instance-global-arn" } } ] } ] }

Tools

AWS services

  • Amazon Aurora is a fully managed relational database engine that's built for the cloud and compatible with MySQL and PostgreSQL.

  • Amazon ElastiCache helps you set up, manage, and scale distributed in-memory cache environments in the AWS Cloud. This pattern uses Amazon ElastiCache (Redis OSS).

  • AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use. In this pattern, Lambda functions are used by AWS Step Functions to perform the steps.

  • Amazon Relational Database Service (Amazon RDS) helps you set up, operate, and scale a relational database in the AWS Cloud. This pattern supports Amazon RDS for MySQL, Amazon RDS for PostgreSQL, and Amazon RDS for MariaDB.

  • AWS SDK for Python (Boto3) helps you integrate your Python application, library, or script with AWS services. In this pattern, Boto3 APIs are used to communicate with the database instances or global databases.

  • AWS Step Functions is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications. In this pattern, Step Functions state machines are used to orchestrate and run the cross-Region failover and failback of the database instances or global databases.

Code repository

The code for this pattern is available in the aws-cross-region-dr-databases repository on GitHub.

Epics

TaskDescriptionSkills required

Clone the GitHub repository.

To clone the repository, run the following command:

git clone https://github.com/aws-samples/aws-cross-region-dr-databases.git
AWS DevOps, AWS administrator

Package Lambda functions code in a .zip file archive.

Create the archive files for Lambda functions to include the DR Orchestrator Framework dependencies:

cd <YOUR-LOCAL-GIT-FOLDER>/DR-Orchestration-artifacts bash scripts/deploy-orchestrator-sh.sh
AWS administrator

Create S3 buckets.

S3 buckets are needed to store DR Orchestrator Framework along with your latest configuration. Create two S3 buckets, one in the primary Region (us-east-1), and one in the secondary Region (us-west-2):

  • dr-orchestrator-xxxxxx-us-east-1

  • dr-orchestrator-xxxxxx-us-west-2

Replace xxxxxx with a random value to make the bucket names unique.

AWS administrator

Create subnets and security groups.

In both the primary Region (us-east-1) and the secondary Region (us-west-2), create two subnets and one security group for Lambda function deployment in your VPC:

  • subnet-XXXXXXX

  • subnet-YYYYYYY

  • sg-XXXXXXXXXXXX

AWS administrator

Update the DR Orchestrator parameter files.

In the <YOUR-LOCAL-GIT-FOLDER>/DR-Orchestration-artifacts/cloudformation folder, update the following DR Orchestrator parameter files:

  • Orchestrator-Deployer-parameters-us-east-1.json

  • Orchestrator-Deployer-parameters-us-west-2.json

Use the following parameter values, replacing x and y with the names of your resources:

[ { "ParameterKey": "TemplateStoreS3BucketName", "ParameterValue": "dr-orchestrator-xxxxxx-us-east-1" }, { "ParameterKey": "TemplateVPCId", "ParameterValue": "vpc-xxxxxx" }, { "ParameterKey": "TemplateLambdaSubnetID1", "ParameterValue": "subnet-xxxxxx" }, { "ParameterKey": "TemplateLambdaSubnetID2", "ParameterValue": "subnet-yyyyyy" }, { "ParameterKey": "TemplateLambdaSecurityGroupID", "ParameterValue": "sg-xxxxxxxxxx" } ]
AWS administrator

Upload the DR Orchestrator Framework code to the S3 bucket.

The code will be safer in an S3 bucket than in the local directory. Upload the DR-Orchestration-artifacts directory, including all files and subfolders, to the S3 buckets.

To upload the code, do the following:

  1. Sign in to the AWS Management Console.

  2. Navigate to the Amazon S3 console.

  3. Select the dr-orchestrator-xxxxxx-us-east-1 bucket.

  4. Choose Upload, and then choose Add folder.

  5. Select the DR-Orchestration-artifacts folder.

  6. Choose Upload.

  7. Select the dr-orchestrator-xxxxxx-us-west-2 bucket.

  8. Repeat steps 4–7.

AWS administrator

Deploy DR Orchestrator Framework in the primary Region.

To deploy DR Orchestrator Framework in the primary Region (us-east-1), run the following commands:

cd <YOUR-LOCAL-GIT-FOLDER>/DR-Orchestration-artifacts/cloudformation aws cloudformation deploy \ --region us-east-1 \ --stack-name dr-orchestrator \ --template-file Orchestrator-Deployer.yaml \ --parameter-overrides file://Orchestrator-Deployer-parameters-us-east-1.json \ --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \ --disable-rollback
AWS administrator

Deploy DR Orchestrator Framework in the secondary Region.

In the secondary Region (us-west-2), run the following commands:

cd <YOUR-LOCAL-GIT-FOLDER>/DR-Orchestration-artifacts/cloudformation aws cloudformation deploy \ --region us-west-2 \ --stack-name dr-orchestrator \ --template-file Orchestrator-Deployer.yaml \ --parameter-overrides file://Orchestrator-Deployer-parameters-us-west-2.json \ --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \ --disable-rollback
AWS administrator

Verify the deployment.

If the AWS CloudFormation command runs successfully, it returns the following output:

Successfully created/updated stack - dr-orchestrator

Alternatively, you can navigate to the AWS CloudFormation console and verify the status of the dr-orchestrator stack.

AWS administrator
TaskDescriptionSkills required

Create the database subnets and security groups.

In your VPC, create two subnets and one security group for the DB instance or global database in both the primary (us-east-1) and the secondary (us-west-2) Regions:

  • subnet-XXXXXX

  • subnet-XXXXXX

  • sg-XXXXXXXXXX

AWS administrator

Update the parameter file for the primary DB instance or cluster.

In the <YOUR LOCAL GIT FOLDER>/App-Stack folder, update the parameter file for the primary Region.

Amazon RDS

In the RDS-MySQL-parameter-us-east-1.json file, update SubnetIds and DBSecurityGroup with the names of resources that you created:

{ "Parameters": { "SubnetIds": "subnet-xxxxxx,subnet-xxxxxx", "DBSecurityGroup": "sg-xxxxxxxxxx", "MySqlGlobalIdentifier":"rds-mysql-instance", "InitialDatabaseName": "mysqldb", "DBPortNumber": "3789", "PrimaryRegion": "us-east-1", "SecondaryRegion": "us-west-2", "KMSKeyAliasName": "rds/rds-mysql-instance-KmsKeyId" } }

Amazon Aurora

In the Aurora-MySQL-parameter-us-east-1.json file, update SubnetIds and DBSecurityGroup with the names of resources that you created:

{ "Parameters": { "SubnetIds": "subnet1-xxxxxx,subnet2-xxxxxx", "DBSecurityGroup": "sg-xxxxxxxxxx", "GlobalClusterIdentifier":"dr-globaldb-cluster-mysql", "DBClusterName":"dbcluster-01", "SourceDBClusterName":"dbcluster-02", "DBPortNumber": "3787", "DBInstanceClass":"db.r5.large", "InitialDatabaseName": "sampledb", "PrimaryRegion": "us-east-1", "SecondaryRegion": "us-west-2", "KMSKeyAliasName": "rds/dr-globaldb-cluster-mysql-KmsKeyId" } }

Amazon ElastiCache (Redis OSS)

In the ElastiCache-parameter-us-east-1.json file, update SubnetIds and DBSecurityGroup with the names of resources that you created.

{ "Parameters": { "CacheNodeType": "cache.m5.large", "DBSecurityGroup": "sg-xxxxxxxxxx", "SubnetIds": "subnet-xxxxxx,subnet-xxxxxx", "EngineVersion": "5.0.6", "GlobalReplicationGroupIdSuffix": "demo-redis-global-datastore", "NumReplicas": "1", "NumShards": "1", "ReplicationGroupId": "demo-redis-cluster", "DBPortNumber": "3788", "TransitEncryption": "true", "KMSKeyAliasName": "elasticache/demo-redis-global-datastore-KmsKeyId", "PrimaryRegion": "us-east-1", "SecondaryRegion": "us-west-2" } }
AWS administrator

Deploy your DB instance or cluster in the primary Region.

To deploy your instance or cluster in the primary Region (us-east-1), run the following commands based on your database engine.

Amazon RDS

cd <YOUR-LOCAL-GIT-FOLDER>/App-Stack aws cloudformation deploy \ --region us-east-1 \ --stack-name rds-mysql-app-stack \ --template-file RDS-MySQL-Primary.yaml \ --parameter-overrides file://RDS-MySQL-parameter-us-east-1.json \ --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \ --disable-rollback

Amazon Aurora

cd <YOUR-LOCAL-GIT-FOLDER>/App-Stack aws cloudformation deploy \ --region us-east-1 \ --stack-name aurora-mysql-app-stack \ --template-file Aurora-MySQL-Primary.yaml \ --parameter-overrides file://Aurora-MySQL-parameter-us-east-1.json \ --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \ --disable-rollback

Amazon ElastiCache (Redis OSS)

cd <YOUR-LOCAL-GIT-FOLDER>/App-Stack aws cloudformation deploy \ --region us-east-1 --stack-name elasticache-ds-app-stack \ --template-file ElastiCache-Primary.yaml \ --parameter-overrides file://ElastiCache-parameter-us-east-1.json \ --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \ --disable-rollback

Verify that the AWS CloudFormation resources deployed successfully.

AWS administrator

Update the parameter file for the secondary DB instance or cluster.

In the <YOUR LOCAL GIT FOLDER>/App-Stack folder, update the parameter file for the secondary Region.

Amazon RDS

In the RDS-MySQL-parameter-us-west-2.json file, update SubnetIDs and DBSecurityGroup with the names of resources that you created. Update the PrimaryRegionKMSKeyArn with the value of MySQLKmsKeyId taken from the Outputs section of the AWS CloudFormation stack for the primary DB instance:

{ "Parameters": { "SubnetIds": "subnet-aaaaaaaaa,subnet-bbbbbbbbb", "DBSecurityGroup": "sg-cccccccccc", "MySqlGlobalIdentifier":"rds-mysql-instance", "InitialDatabaseName": "mysqldb", "DBPortNumber": "3789", "PrimaryRegion": "us-east-1", "SecondaryRegion": "us-west-2", "KMSKeyAliasName": "rds/rds-mysql-instance-KmsKeyId", "PrimaryRegionKMSKeyArn":"arn:aws:kms:us-east-1:xxxxxxxxx:key/mrk-xxxxxxxxxxxxxxxxxxxxx" } }

Amazon Aurora

In the Aurora-MySQL-parameter-us-west-2.json file, update SubnetIDs and DBSecurityGroup with the names of resources you created. Update the PrimaryRegionKMSKeyArn with the value of AuroraKmsKeyId taken from the Outputs section of the AWS CloudFormation stack for the primary DB instance:

{ "Parameters": { "SubnetIds": "subnet1-aaaaaaaaa,subnet2-bbbbbbbbb", "DBSecurityGroup": "sg-cccccccccc", "GlobalClusterIdentifier":"dr-globaldb-cluster-mysql", "DBClusterName":"dbcluster-01", "SourceDBClusterName":"dbcluster-02", "DBPortNumber": "3787", "DBInstanceClass":"db.r5.large", "InitialDatabaseName": "sampledb", "PrimaryRegion": "us-east-1", "SecondaryRegion": "us-west-2", "KMSKeyAliasName": "rds/dr-globaldb-cluster-mysql-KmsKeyId" } }

Amazon ElastiCache (Redis OSS)

In the ElastiCache-parameter-us-west-2.json file, update SubnetIDs and DBSecurityGroup with the names of resources that you created. Update the PrimaryRegionKMSKeyArn with the value of ElastiCacheKmsKeyId taken from the Outputs section of the AWS CloudFormation stack for the primary DB instance:

{ "Parameters": { "CacheNodeType": "cache.m5.large", "DBSecurityGroup": "sg-cccccccccc", "SubnetIds": "subnet-aaaaaaaaa,subnet-bbbbbbbbb", "EngineVersion": "5.0.6", "GlobalReplicationGroupIdSuffix": "demo-redis-global-datastore", "NumReplicas": "1", "NumShards": "1", "ReplicationGroupId": "demo-redis-cluster", "DBPortNumber": "3788", "TransitEncryption": "true", "KMSKeyAliasName": "elasticache/demo-redis-global-datastore-KmsKeyId", "PrimaryRegion": "us-east-1", "SecondaryRegion": "us-west-2" } }
AWS administrator

Deploy your DB instance or cluster in the secondary Region.

Run the following commands, based on your database engine.

Amazon RDS

cd <YOUR-LOCAL-GIT-FOLDER>/App-Stack aws cloudformation deploy \ --region us-west-2 \ --stack-name rds-mysql-app-stack \ --template-file RDS-MySQL-DR.yaml \ --parameter-overrides file://RDS-MySQL-parameter-us-west-2.json \ --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \ --disable-rollback

Amazon Aurora

cd <YOUR-LOCAL-GIT-FOLDER>/App-Stack aws cloudformation deploy \ --region us-west-2 \ --stack-name aurora-mysql-app-stack \ --template-file Aurora-MySQL-DR.yaml \ --parameter-overrides file://Aurora-MySQL-parameter-us-west-2.json \ --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \ --disable-rollback

Amazon ElastiCache (Redis OSS)

cd <YOUR-LOCAL-GIT-FOLDER>/App-Stack aws cloudformation deploy \ --region us-west-2 \ --stack-name elasticache-ds-app-stack \ --template-file ElastiCache-DR.yaml \ --parameter-overrides file://ElastiCache-parameter-us-west-2.json \ --capabilities CAPABILITY_AUTO_EXPAND CAPABILITY_NAMED_IAM CAPABILITY_IAM \ --disable-rollback

Verify that the AWS CloudFormation resources deployed successfully.

AWS administrator

Related resources