DR for cloud-native workloads - AWS Prescriptive Guidance

DR for cloud-native workloads

Consider how your cloud-native workloads align to your DR objectives. AWS provides multiple Availability Zones in Regions around the world. Many enterprises using the AWS Cloud align their workload architectures and DR objectives to withstand the loss of an Availability Zone. The Reliability Pillar in the AWS Well-Architected Framework supports this best practice. You can architect your workloads and their service and application dependencies to use multiple Availability Zones. You can then automate your DR and achieve your DR objectives with minimal to no intervention.

In practice, however, you might find that you are unable to establish a redundant, active, and automated architecture for all of your components. Examine every layer of your architecture to determine the necessary DR processes to achieve your objectives. This might vary from workload to workload, with different architectural and service requirements. This guide covers considerations and options for Amazon EC2. For other AWS services, you can refer to the AWS documentation to determine high availability and DR options.

DR for Amazon EC2 in a single Availability Zone

Try to architect your workloads to actively support and service clients from multiple Availability Zones. You can use Amazon EC2 Auto Scaling and Elastic Load Balancing to achieve a Multi-AZ server architecture for Amazon EC2 and other services.

If your architecture has EC2 instances that can’t be load balanced and can have only a single instance running at any given moment, you can use either of the following options.

  • Create an Auto Scaling group that has a minimum, maximum, and desired size of 1 and is configured for multiple Availability Zones. Create an AMI that can be used to replace the instance if it fails. Make sure that you define the proper automation and configuration so that a newly provisioned instance from the AMI can be automatically configured and provide service. Create a load balancer that points to the Auto Scaling group and is configured for multiple Availability Zones. Optionally, create an Amazon Route 53 alias that points to the load balancer endpoint.

  • Create a Route 53 record for your active instance and have your clients connect using this record. Create a script that creates a new AMI of your active instance and uses the AMI to provision a new EC2 instance in the stopped state in a separate Availability Zone. Configure the script to run periodically and to terminate the previous stopped instance. If there is an Availability Zone failure, start your backup instance in your alternative Availability Zone. Then update the Route 53 record to point to this new instance.

Test your solution thoroughly by simulating the failure that the solution was designed to protect against. Also consider the updates that your DR solution will need as your workload architecture changes.

DR for Amazon EC2 in a regional failure

Customers with very high availability requirements (for example, mission-critical applications that cannot tolerate any downtime) can use AWS across multiple Regions to provide further resiliency against issues at the Region level. Customers must carefully weigh the complexity, cost, and effort required to establish and maintain a multi-Region DR plan against the benefit. AWS provides features that support multi-Region architectures for global availability, failover, and DR. This guide covers a few of the available features that are specific to backup and recovery for Amazon EC2.

AWS AMIs and Amazon EBS snapshots are regional resources that can be used to provision new instances within a single Region. However, you can copy your snapshots and AMIs to another Region and use them to provision new instances in that Region. To support a regional failure DR plan, you can automate the process of copying AMIs and snapshots to other Regions. AWS Backup and Amazon Data Lifecycle Manager support cross-Region copying as a part of your backup configuration.

AWS Elastic Disaster Recovery can be used to automate and continuously replicate your Amazon EC2 servers in one Region to an alternate DR Region. Elastic Disaster Recovery can simplify your multi-Region DR approach and help you to regularly test your cross-Region Amazon EC2 DR plan by using drills. Elastic Disaster Recovery can help when backup and recovery is unable to meet your RTO and RPO objectives. Elastic Disaster Recovery can help you lower your RTO to minutes and your RPO into the sub-second range.

Whichever solution you use, you must determine the provisioning, failover, and failback process to use in the event of an outage. You can use Route 53 with health checks and Domain Name System failover to help support your solution.