What is Amazon Application Recovery Controller (ARC)? - Amazon Application Recovery Controller (ARC)

What is Amazon Application Recovery Controller (ARC)?

Amazon Application Recovery Controller (ARC) helps you prepare for and complete faster recovery for applications running on AWS. ARC provides two sets of capabilities: Multi-Availability Zone (AZ) recovery, which includes zonal shift and zonal autoshift, and multi-Region recovery, which includes routing control and readiness check. With ARC, you can leverage highly-available recovery tools to quickly mitigate impairments that are impacting your multi-Region or multi-AZ applications. You can also use readiness check to gain insights into whether your applications and resources are prepared for recovery.

The AWS Global Cloud Infrastructure provides fault tolerance and resilience, with each AWS Region made up of multiple, fully-isolated Availability Zones. ARC works within this AWS structure to help your applications be resilient.

Multi-AZ recovery

If you have applications that are built to take advantage of Availability Zones in AWS, you can quickly isolate and recover from AZ impairments using zonal shift. Zonal shift enables you to recover from Availability Zone (AZ) impairments, by temporarily moving traffic for a supported resource away from an AZ, to healthy AZs in the AWS Region. Starting a zonal shift helps your application recover quickly, for example, from a developer's bad code deployment or from an AWS impairment in a single Availability Zone. By moving traffic away, you reduce the impact for clients who are using your application when there's an issue in one AZ.

You can start a zonal shift for any supported resource in your account in a Region. AWS services automatically register supported AWS resources with zonal shift in ARC, so that you can start a zonal shift at any time.

Zonal autoshift is a capability in ARC that you can enable to authorize AWS to shift traffic away from an AZ for supported resources, on your behalf, to healthy AZs in the AWS Region. AWS starts an autoshift when internal telemetry indicates that there is an impairment in one AZ in a Region that could potentially impact customers. The internal telemetry incorporates metrics from multiple sources, including the AWS network, and the Amazon EC2 and Elastic Load Balancing services.

Zonal shifts and autoshifts are temporary. When you start a manual zonal shift, you must specify an (extendable) expiration, of up to three days initially. If you want to continue to keep traffic away from an AZ, you can update the zonal shift and set a new expiration. With zonal autoshift, AWS ends an autoshift when indicators show that there is no longer an issue or potential issue.

To learn more about these capabilities, see the following chapters:

Multi-Region recovery

If you have an application that you've designed to operate out of another AWS Region to continue operations you can use routing control for failover. Routing control enables you to fail over traffic from one AWS Region to another when there's an issue, so that you can ensure that your application stays available. Routing control includes safety rules, which help protect you from unintended outcomes, by imposing guardrails that you define. Using these rules, you can make sure, for example that only one of your application replicas, active or standby, is enabled and in use at a time.

For multi-Region recovery, ARC can help you fail over DNS traffic across AWS Regions. The extremely reliable routing controls in ARC enable you to recover your application by rerouting traffic away from a Region with an impairment to a healthy Region.

With readiness check, ARC continually monitors AWS resource quotas, capacity, and network routing policies, and can notify you about changes that would affect your ability to fail over to a replica and recover. Continual readiness checks help make sure, on an ongoing basis, that you can maintain your multi-Region applications in a state that is scaled and configured to handle failover traffic. Readiness check is useful when you first configure ARC, and during normal application operation. Readiness check is not intended to be used in the critical path for failover during an event.

To learn more about these capabilities, see the following chapters: