Performing a cross-region Failback - AWS Elastic Disaster Recovery

Performing a cross-region Failback

AWS Elastic Disaster Recovery (AWS DRS) allows you to failover and failback your EC2-based applications from one AWS Region to another AWS Region. The failover process is the same as failing over into an AWS Region from a source outside of AWS. However, the failback process is different when both the source site and the failover site are an AWS Region. The instructions below describe the complete current cross-Region failover and failback process. The examples in these instructions use us-east-1 as the source AWS Region, and us-east-2 as the recovery AWS Region, but any combination of AWS Regions that are supported by DRS will work.

Note

You can use these instructions to replicate from one Availability Zone (AZs) to another. Ensure that you correctly configure the Subnet in your EC2 Launch Template for cross-AZ replication. Learn more about AZs.

Setup

The first thing you need to do is Initialize DRS in both your recovery AWS Region, and your source AWS Region.

You should test the full failover and failback process as part of setting up DRS, and ensure that you are familiar with how everything works, so that you are prepared to respond quickly when needed. The source AWS Region is where you normally have the EC2 instances on which your applications run. The recovery AWS Region is the AWS Region into which you will be replicating these EC2 instances, and where you will launch Recovery instances (for testing, or when you have a real event that impacts your source applications). You will use the AWS DRS console in the recovery AWS Region for day-to-day operations of DRS, and for launching Recovery instances. The recovery AWS Region is where most cost of operating the DRS service will be incurred. You will only use the DRS console in the source AWS Region for failback purposes. You will only incur DRS related costs in the source Region during failback replication.

Cross-region Failback

After you performed a failover, your applications are now running temporarily on EC2 instances in the recovery AWS Region (us-east-2 in our example). To perform a failback, you will want to first copy these EC2 instances back to your original source AWS Region (us-east-1 in our example). In order to be able to do this, you first need cause DRS to no longer recognize these instances as Recovery instances, so that you can use them as source instances, replicate them back to your original source region, and re-launch them there.

  1. Initialize DRS in the source AWS Region.

    1. This should be done ahead of time and both failover and failback should be tested ahead of time to ensure that your solution will work for you in case of a real event.

    2. Open the AWS Management Console and choose the source AWS Region (us-east-1 in our example), then search for DRS, and follow the on-screen instructions to initialize the service.

  2. Disassociate the launched Recovery instances from DRS

    1. In the DRS Console, within the recovery AWS Region (us-east-2 in our example), navigate to the Recovery Instances page in the DRS Console and choose the Recovery instances you want to fail back.

    2. From the Actions menu, choose Disconnect from AWS. This will delete the AWS Replication Agent from the Recovery instance or instances and will remove all of resources associated with the selected Recovery instance or instances from Elastic Disaster Recovery, but will keep the Recovery instance Elastic Disaster Recovery resources and the EC2 resources intact.

    3. Choose the same Recovery instances, and, from the Actions menu, choose Delete Recovery instances.

      This will not delete the underlying EC2 instance, but rather will only delete its representation in the DRS Console, completely disassociating the EC2 instance from the AWS DRS service.

  3. Instal agents and begin failback replication

    Install the AWS Replication Agent on each of the EC2 instances you wish to failback from the recovery AWS Region (us-east-2 in our example) to the in the original source AWS Region (us-east-1 in our example). During installation, you will be prompted for the AWS Region Name into which you want to replicate. Enter the name of the original source AWS Region (us-east-1 in our example). The failback replication will start automatically, and all the AWS resources required for the replication will be automatically created in the source AWS Region (your failback target).

  4. Launch, validate, and redirect traffic

    During the failback cutover window, you need to:

    1. Re-launch the instances

    2. Disassociate these instances from DRS. .

    3. Validate your applications are working as expected. If you run into any issues, you can relaunch the instances and drill again, or even abort the failback cutover window - your Recovery instances are still running in your recovery AWS Region (us-east-2 in our example).

      1. If you do relaunch, or abort the failback cutover window, you will need to manually terminate the launched instances, using the AWS EC2 Console.

    4. Once you have validated proper operation:

      1. We recommend enabling termination protection for these instances.

      2. Redirect traffic to these instances, which will now become your new primary instances. Traffic redirection is not done using DRS - use a service of your choice to manage this (consider using Amazon Route 53).

  5. Re-protect your new source servers

    1. The new primary instances launched in the previous step are not yet themselves replicating to your recovery AWS Region. You need to install the AWS Replication Agent on them.

    2. Once you install the agent, these instances will appear as source servers in the DRS Console in the recovery AWS Region (us-east-2 in our example).

      Important

      Make sure to add a tag during agent installation, to identify them as the new source servers - the old source servers and these new ones will both appear in the DRS Console, and since they presumably have the same hostname, will be very hard to differentiate from one another without such a tag.

      Important

      Do not remove the old source servers yet from the DRS console. Doing so will also remove any Point-in-Time recovery points which you may still need for regulatory or other reasons. See the following cleanup section for details on how and when to remove the old source servers.

  6. Cleanup and return to normal operation

    1. Once you have completed the failover to failback cycle, there may be multiple AWS resources left behind that you no longer need, and that are costing you money:

      1. Original EC2 instances (in the source AWS Region, us-east-1 in our example)

        These have now been replaced by the new instances launched in step 7 above. You should have stopped these instances once you performed the failover. Now you can terminate them using the AWS EC2 Console (in us-east-1 in our example).

      2. Recovery (failover) instances

        These instances, launched in your recovery AWS Region (us-east-2 in our example), are no longer needed now that you have launched new primary instances in your original source AWS Region (us-east-1 in our example). These instances need to be terminated directly from the AWS EC2 Console in your recovery AWS Region (us-east-2 in our example) - they are no longer represented in the Recovery Instances page of the DRS Console.

        Important

        Do not yet remove the DRS source servers that were created when you installed agents on these servers to begin failback replication - the source server resource is also what maintains the Point-In-Time recovery points which you may still need, until your new primary source servers have been running long enough to create the Points-In-Time you need to maintain for regulatory or other reasons.

      3. Original AWS DRS Source Servers

        These appear in DRS Console in your recovery AWS Region. These can be easily confused with the new source servers, so ensure you are acting on the correct ones (see tagging recommendation above). Before you remove these, also make sure that the new DRS source servers have all the recovery Points-In-Time that you need. Once you are ready to remove them, use the Disconnect from AWS option in the commands menu in the AWS DRS Console in the recovery AWS Region (us-east-2 in our example).

      4. Failback Source Servers

        These are the resources that were created in DRS Console in the source AWS Region (us-east-1 in our example) when you started failback replication. Before you remove these, make sure that the new DRS source servers have all the recovery Points-In-Time that you need. Once you are ready to remove them, use the Disconnect from AWS option in the commands menu in the DRS Console in the source AWS Region (us-east-1 in our example).

Note

Performing cross-region replication, failover and failback accrues additional costs, not detailed in the DRS pricing examples. These additional costs consist of cross-Region data transfer costs during initial data replication, ongoing data replication, and failback replication; as well as the cost of replication resources (such as EBS volumes, snapshots, etc.), used for failback replication; and also the DRS hourly billing for failback source servers.