Performing a cross-region Failback - AWS Elastic Disaster Recovery

Performing a cross-region Failback

AWS Elastic Disaster Recovery (AWS DRS) allows you to perform failover and failback your EC2-based applications from one AWS Region to another AWS Region. The failover process is the same as failing over into an AWS Region from a source outside of AWS, but the failback process is different. The instructions below describe the complete cross-Region failover and failback process. In the examples, we use us-east-1 as the source AWS Region and us-east-2 as the recovery AWS Region, but any combination of AWS Regions that are supported by DRS will work.

Overview and prerequisites

The failback process starts after the failover process ends. During failover, DRS allows you to replace the EC2 source instance (A1) with the EC2 recovered instance (B3). The current AWS resource state is illustrated in the following diagram:

After performing a recovery, your applications are running on EC2 instances in the recovery region. However, these recovered instances (marked B3 in the diagram above) are not protected against other potential outages. In order to avoid data loss, you should start a reversed replication immediately. Starting reversed replication involves copying the data from the EC2 recovered instances (B3) to the original region, an operation that takes time and incurs cross-Region data transfer costs.

Once replication has reached a healthy state, failing back to the source region is possible using the DRS console on that region, assuming DRS has been initialized in the source region.

Important
  • To ensure operational continuity, initialize the DRS in advance in both the source and target AWS Regions, and conduct regular failover and failback drills.

  • Before starting a failback, make sure the EC2 recovered instances (B3) have a network interface while meeting the specified network requirements

Performing cross-region failback

  1. Start reversed replication.

    1. Go to the recovery AWS Region (in this example, us-east-2).

    2. Choose the AWS Elastic Disaster Recovery service.

    3. Navigate to the Recovery instances page.

    4. Select the servers that you want to protect, and click Start revered replication .

    5. A Source Server (A2) will be created in the source region, as shown in the following diagram.

      Note

      All server data is transferred over the wire during this step. This process could take some time and will result in cross-Region data transfer costs. Moreover, starting reversed replication creates additional replication resources (A2). To avoid double billing, you can stop replicating the source instances (A1) by navigating to the DRS Source server in the recovery region (B1) and clicking on Stop replication in the replication drop-down menu. Make sure that you only stop the replication after validating the failover instances because once replication is stopped, all previous points in time are deleted.

      Important

      Once replication is stopped, all previous points in time are deleted. This is done to minimize costs.

  2. Launch, validate, and redirect traffic.

    After the Reversed direction launch state is marked as Ready, take the following steps to complete the failback:

    1. Find the relevant Source Servers (A2) in the source region by clicking the Replication to source server link in the Recovery instance (B2).

      Note

      You can also find it directly on the Source Servers page in DRS console at the source region

    2. If the state is Ready (or Ready with lag), click Launch for failback under Initiate recovery job.

      Important

      Make sure that your applications (A4) are working as expected. If you run into any issues, you can relaunch the instances and try again. Until you opt to failback, your Recovery instances (B3) will continue to run in your recovery AWS Region to ensure business continuity.

    3. Redirect traffic to failed back instances (A4), which will now become your new primary instances. Traffic redirection is not conducted using DRS. Choose a service according to your preferences (consider using Amazon Route 53).

  3. Protect your new failed back instances

    Important

    Do not perform this step when performing a drill. This step replaces the instances that DRS replicates (from the Source instances, A1, to the Failed back instances, A4). In a drill, the source instances (A1) are still your production environment.

    The newly launched Failed-back instances (A4) are not protected. In order to protect them, follow these steps:

    1. Navigate to the Recovery instance (A3) in the source region.

    2. Click the Start reversed replication button. This step will replace the Instances that the Source Server (B1) protects (A4 instead of A1).

  4. Clean your environment

    After the failover to failback cycle is complete, you may be left with multiple AWS resources that you no longer need and that are costly to maintain. These include the source and failover EC2 instances (A1,B3), the Recovery instances (B2, A3), and the Source servers (A2). Consider removing them.

    Cleanup steps:

    1. Stop replication on the Source servers (A2) of the source region.

      Navigate to the Source Server in the source region (A2), and click on Stop replication under the Replication menu. This step is required before terminating the recovery instance (B2).

    2. Terminate the Recovery instances (B2).

      These instances, launched in your recovery AWS Region, are no longer needed now that you have launched new primary instances in your original source AWS Region. To terminate these instances, navigate to the AWS DRS Console in your recovery AWS Region (B2). After termination, those instances will no longer appear in the Recovery Instances page of the DRS Console. This process also terminates the Recovered EC2 instances (B3).

    3. Terminate the source region EC2 instances (A1).

      These have now been replaced by the new instances launched in step 2 above (EC2 failed back instances, A3). You might have stopped these instances after the failover, and you can now terminate them using the AWS EC2 Console.

    4. Remove the Recovery instances (A3) in the source region.

      Navigate to the Recovery instances in the DRS console. Select the relevant Recovery instances and click on the Disconnect from AWS under the Actions drop down. Then, click on Delete server under the same Action drop down.

    5. Remove the Source servers (A2) in the source region.

      Navigate to the Source servers in the DRS console. Select the relevant Source servers and click on the Disconnect from AWS under the Actions drop down. Then, click on Delete server under the same Action drop down.

Performing a drill

To conduct a drill, follow the steps 1 and 2 as described above, and then perform a different cleanup process as described below.

Note
  1. Do not to stop the source server (B1) in the recovery AWS region as recommended in the note of step 1-e.

  2. Do not perform Step 3, Protecting the failed back instances would affect your production data.

Cleaning up after a drill

After a successful drill your AWS environment should look like this:

The only two AWS resources that need to remain are your actual production environment (A1) and its replication backup (B1). Since DRS protects replication servers, you must stop the replication first.

  1. Stop the replication of the Source servers (A2) in the Source region.

    Important

    Make sure you don’t stop replicating the Source servers (B1) in the recovery region.

  2. Terminate the Recovery Instances (A3) in the Source region and the Recovery Instances (B2) in the recovery region. As a result of this action, both the recovered instances (B3) and the Failed back Instances (A4) are terminated as well.

Note

Performing cross-region replication, failover and failback accrues additional costs, not detailed in the DRS pricing examples. These additional costs consist of cross-Region data transfer costs during initial data replication, ongoing data replication, and failback replication; as well as the cost of replication resources (such as EBS volumes, snapshots, etc.), used for failback replication; and also the DRS hourly billing for failback source servers.