Disaster recovery scenarios - AWS Prescriptive Guidance

Disaster recovery scenarios

This section provides examples of a single Availability Zone or AWS Region failure, and discusses options for disaster recovery (DR). The examples assume a recovery point objective (RPO) of 15 minutes and a recovery time objective (RTO) of 4 hours.

Availability Zone failure

You can use one of the following options to recover from a single Availability Zone failure within the given parameters (RPO of 15 minutes, RTO of 4 hours).

  • Provision the application recovery by using the most recent Amazon Elastic Compute Cloud (Amazon EC2) image backup, and connect to the existing warm standby database instance through an Always On availability group deployment or log shipping.

  • If you have an application such as an SQL Server Reporting Services (SSRS) application that has a scaled-out deployment, the load balancer can redirect all traffic to the secondary node.

  • You can use Amazon EC2 base AMIs for the application and database server to provision the infrastructure. Databases can be restored in a new Availability Zone, depending on their size and backup frequency, from the most recent native backups (full backup, differential backup, or transaction log backups every 5 minutes) or by using EBS snapshots. This option meets the RPO and RTO requirements but requires custom scripting. You must also consider the time required to provision the infrastructure, and meeting RPO and RTO requirements can be challenging.

  • Amazon EC2 images (including EBS volumes) for both applications and the database server can be restored in a new Availability Zone. RPO can be challenging, depending on the most recent backup, but this option can be combined with the most recent transaction logs to meet requirements. This option supports Windows Volume Shadow Copy Service (VSS) snapshots.

Region failure

You can use one of the following options to recover from a single AWS Region failure within the given parameters (RPO of 15 minutes, RTO of 4 hours).

  • You can use Amazon EC2 base Amazon Machine Images (AMIs) for the application and database server to provision the infrastructure. Databases can be restored in a new Region, depending on their size and backup frequency, from the most recent native backups (full backup, differential backup, or transaction log backups every 5 minutes). This option meets the RPO and RTO requirements but requires custom scripting.

    • SQL Server log shipping as a DR solution requires a manual failover to a standby server and depends on the frequency of log backups. This is one of the least expensive DR options. SQL Server editions for the primary and log-shipped DR site do not need to match. This option meets the RPO (by using transaction log backups every 5 minutes) and RTO, but requires maintenance through manual, custom scripts. Large databases require long restoration times.

  • You can use an Amazon EC2 AMI for both the application and the database server and restore it to a target in a new Region. RPO depends on the size and frequency of the backups.

    • The most recent application images can be restored by using an AMI. You can use recent native differential or transaction log backups every 5 minutes to bring the database up to date to meet the RPO.

    • RTO depends on the size and time to transfer and restore the snapshots to the new Region, if the source isn’t already in sync with the target.

  • The solution with the least downtime is to restore the application backup image and have a warm standby SQL Server node in a remote Region by using a two-node, three-node, or four-node availability group setup (basic, classic, or distributed) and to connect to the standby database server after a failover. The synchronous-commit mode replica meets the RPO requirements, whereas asynchronous-commit mode replica might be delayed depending on the volume of transactions. You can use a distributed availability group configuration to scale out database nodes in a new Region, if needed. This configuration also reduces complexity because it uses two independently availability groups instead of a single availability group spread across Regions in either synchronous-commit or asynchronous-commit mode, and meets both RTO and RPO requirements comfortably. Alternatively, using SQL Server basic availability groups in the Standard edition is also an option. However, it has limitations because it supports only up to two nodes, and only one database can be in a single availability group although multiple availability groups are supported. You can set up SQL Server Standard edition within one Region or across Regions. This edition provides cost savings because it doesn’t charge for the secondary node, which isn’t accessible for read operations. SQL Server Enterprise edition provides full functionality, and supports the failover of all databases as a single availability group failover.

Common use cases

As a sizing exercise, 80% of SQL Server applications running on Amazon EC2 that have a normal online transaction processing (OLTP) workload can be grouped into one of three categories based on how critical they are:

  • SQL Server HA/DR with SQL Server backups, using two synchronous-commit replicas and one asynchronous-commit mode replica

  • AWS Backup HA/DR with SQL Server backups, using an Amazon EC2 AMI for both the application and the database, and Amazon EBS storage

  • AWS Backup HA/DR with SQL Server backups, using an Amazon EC2 base AMI for the database server, an Amazon EC2 image for the application, and Amazon EBS snapshots

The following table provides details about each category.

  SQL Server HA/ DR with SQL Server backups AWS Backup HA/DR with AMIs, EBS storage, and SQL Server backups AWS Backup HA/DR with AMIs, EBS snapshots, and SQL Server backups

Restore process in case of a disaster

  • Restore Amazon EC2 base AMI for the application from AWS Backup

  • Fail over to the standby instance in the Region (in the case of Availability Zone failure) or to the cross-Region instance (in case of Region failure)

  • Meets RPO and RTO requirements

  • Restore Amazon EC2 images from backups for both the application and the database

  • Provides both in-Region and cross-Region support

  • Apply most recent SQL Server differential and transaction log backups (every 15 minutes) to meet RPO and RTO requirements for the database

  • Restore Amazon EC2 image from backup for the application

  • Restore Amazon EC2 base AMI for the database server

  • Restore EBS snapshots (if any)

  • Cluster has to be rebuilt

  • Provides both in-Region and cross-Region support

  • Apply most recent differential and transaction log backups to the database to meet RPO requirements, but RTO might not be met

Primary resources

  • Three SQL Server Enterprise edition licenses (passive HA and DR nodes license is free if you have an existing Software Assurance licensing agreement with Microsoft; see announcement)

  • Amazon EC2 backup space on Amazon Simple Storage Service (Amazon S3)

  • Cross-Region data transfer

  • One SQL Server license (any edition).

  • Amazon EC2 backup space on Amazon S3

  • SQL Server backups (differential and log files) on Amazon S3

  • Cross-Region data transfer

  • One SQL Server license (any edition).

  • Amazon EC2 backup space on Amazon S3

  • SQL Server backups (differential and log files) on Amazon S3

  • Cross-Region data transfer

HA/DR

Offers HA and DR

Offers DR only

Offers DR only

RPO

Failover is handled by SQL Server availability group (DR is manual)

Manual or custom scripted

Manual or custom scripted

RTO

Seconds to minutes

Minutes to hours

Multiple hours

Risk of missing SLAs

Low

Medium

High

Manageability

Simple

Medium

Medium

Scaling

Simple

Medium

Medium

File size limitations for uploads to Amazon S3 or cross-Region transfer

N/A – Handled in synchronous-commit mode or asynchronous-commit mode to a warm standby

Yes

Yes

Data loss

Near zero (depends on the workload and infrastructure provisioned)

Depends on the frequency of Amazon EC2 backup images and SQL Server backups

Depends on the frequency of Amazon EC2 backup images or EBS snapshots and SQL Server backups

Cost

Medium

Low – medium

Low – medium