Amazon Route 53 Application Recovery Controller components - Amazon Route 53 Application Recovery Controller

Amazon Route 53 Application Recovery Controller components

This section defines the components included in Amazon Route 53 Application Recovery Controller zonal shift, zonal autoshift, readiness check, and routing control.

Zonal shift components

The following diagram illustrates an example of a zonal shift shifting traffic away from an Availability Zone in an AWS Region. Safety rules built into Route 53 ARC prevent you from starting another zonal shift for a resource when it already has an active zonal shift.


					Diagram of a zonal shift with three Availability Zones

The following are components of the zonal shift capabilities in Route 53 ARC.

Zonal shift

You start a zonal shift for a managed resource in your AWS account to temporarily move traffic away from an Availability Zone in an AWS Region. Supported AWS resources are automatically registered with Route 53 ARC, and then they are managed resources for zonal shifts in your account. Currently you can start a zonal shift only for Network Load Balancers and Application Load Balancers that do not have cross-zone load balancing configured.

Starting a zonal shift helps your application quickly recover, for example, from a developer's bad code deployment or from an AWS infrastructure failure in a single Availability Zone, reducing the impact and time lost from an issue in one zone.

Built-in safety rules

Safety rules built into Route 53 ARC prevent more than one traffic shift for a resource from being in effect at a time. That is, only one customer-initiated zonal shift, practice run zonal shift, or autoshift for the resource can be actively shifting traffic away from an Availability Zone. For example, if you start a zonal shift for a resource when it is currently shifted away with autoshift, your zonal shift takes precedence. For more information, see Zonal autoshift in Amazon Route 53 Application Recovery Controller and Outcomes for practice runs.

Resource identifier

The identifier for a resource to include in a zonal shift. The identifier is the Amazon Resource Name (ARN) for the resource.

You can only include in a zonal shift the resources in your account that are in an AWS service that is supported by Route 53 ARC. Resources in those AWS services are registered with Route 53 ARC by the AWS service.

Note

You can only start a zonal shift for Network Load Balancers and Application Load Balancers with cross-zone load balancing turned off.

Managed resource

AWS services register resources automatically with Route 53 ARC for zonal shift. A resource that has been registered is a managed resource in Route 53 ARC.

Resource name

The name of a managed resource in Route 53 ARC.

Status (zonal shift status)

A status for a zonal shift. The Status for a zonal shift can have one of the following values:

  • ACTIVE: The zonal shift is started and active.

  • EXPIRED: The zonal shift has expired (the expiry time was exceeded).

  • CANCELED: The zonal shift was canceled.

Applied status

An applied status indicates whether a traffic shift is in effect for a resource. The shift that has the status APPLIED determines the Availability Zone where application traffic has been shifted away for a resource, and when that traffic shift ends.

Expiry time (expiration time)

The expiry time (expiration time) for a zonal shift. Zonal shifts are temporary. For a customer-initiated zonal shift, you can initially set a zonal shift to be active for up to three days (72 hours).

When you start a zonal shift, you specify how long you want it to be active, which Route 53 ARC converts to an expiry time (expiration time). You can cancel a customer-initiated zonal shift, for example, if you're ready to restore traffic to the Availability Zone. Or you can extend a customer-initiated zonal shift by updating it to specify another length of time to expire in.

You can cancel both customer-initiated zonal shifts and zonal shifts that AWS starts for a practice run with zonal autoshift.

Zonal autoshift components

The following diagram illustrates an example of an autoshift shifting traffic away from an Availability Zone when internal telemetry indicates that there is an Availability Zone impairment that could potentially impact customers.


					Diagram of an autoshift with three Availability Zones

The following are components of the zonal autoshift capabilities in Route 53 ARC.

Zonal autoshift

Zonal autoshift shifts traffic away for a resource, without requiring you to take any action. Zonal autoshift is a capability in Route 53 ARC where AWS starts an autoshift when internal telemetry indicates that there is an Availability Zone impairment that could potentially impact customers. Be aware that, in some cases, resources might be shifted away that are not experiencing impact.

Practice runs

When you enable zonal autoshift for a resource, you must also configure zonal autoshift practice runs for the resource. AWS performs a zonal shift for practice runs about weekly, for about 30 minutes. Practice runs make sure that your application can run normally with the loss of one Availability Zone. In a practice run, AWS shifts traffic for a resource away from one Availability Zone with a zonal shift, and then shifts traffic back when the practice run ends.

Practice run configuration

A practice run configuration defines the blocked dates and windows, if any, and the CloudWatch alarms that you specify for the practice run for a resource in zonal autoshift. You can edit a practice run at any time, to add or change blocked dates or windows, or to update the alarms for the practice run.

To enable zonal autoshift, you must have a practice run configuration in place for a resourceyou can also delete a practice run. To delete a practice run configuration for a resource, zonal autoshift must be disabled.

Practice run alarm

When you configure practice runs, you specify CloudWatch alarms that you create in CloudWatch, based on your resource and application requirements. The alarms that you specify can block a practice run from starting, or can stop a practice run in progress, if your application is adversely affected by the practice run.

If an alarm that you specify goes into an ALARM state, Route 53 ARC ends the zonal shift for the practice run, so that traffic for the resource is no longer shifted away from the Availabilty Zone.

There are two types of alarms that you specify for practice runs: an outcome alarm, to monitor the health of your resource and application during the practice run, and a blocking alarm, which you can configure to prevent practice runs from starting, or to stop an in-progress practice run. The outcome alarm is required; the blocking alarm is optional.

Practice run outcome

Route 53 ARC reports an outcome for each practice run. The following are the possible practice run outcomes:

  • PENDING: The zonal shift for the practice run is active (in progress). There's no outcome to return yet.

  • SUCCEEDED: The outcome alarm did not enter an ALARM state during the practice run, and the practice run completed the full 30 minute test period.

  • INTERRUPTED: The practice run ended for a reason that was not the outcome alarm entering an ALARM state. A practice run can be interrupted for a variety of reasons. For example, a practice run that ends because the blocking alarm specified for the practice run entered an ALARM state has an outcome of INTERRUPTED. For more information about reasons for an INTERRUPTED outcome, see Outcomes for practice runs.

  • FAILED: The outcome alarm entered an ALARM state during the practice run.

Built-in safety rules

Safety rules built into Route 53 ARC prevent more than one traffic shift for a resource from being in effect at a time. That is, only one customer-initiated zonal shift, practice run zonal shift, or autoshift for the resource can be actively shifting traffic away from an Availability Zone. For example, if you start a zonal shift for a resource when it is currently shifted away with autoshift, your zonal shift takes precedence. For more information, see Zonal autoshift in Amazon Route 53 Application Recovery Controller and Outcomes for practice runs.

Resource identifier

The identifier for a resource to include in a zonal shift. The identifier is the Amazon Resource Name (ARN) for the resource.

You can only include in a zonal shift the resources in your account that are in an AWS service that is supported by Route 53 ARC. Resources in those AWS services are registered with Route 53 ARC by the AWS service.

Note

You can only configure zonal autoshift, for Network Load Balancers and Application Load Balancers with cross-zone load balancing turned off.

Managed resource

AWS services register resources automatically with Route 53 ARC for zonal autoshift. A resource that has been registered is a managed resource in Route 53 ARC.

Resource name

The name of a managed resource in Route 53 ARC.

Applied status

An applied status indicates whether a traffic shift is in effect for a resource. When you configure zonal autoshift, a resource can have more than one active traffic shift—that is, a practice run zonal shift, customer-initiated zonal shift, or autoshift. However, only one is applied, that is, is in effect for the resource at a time. The shift that has the status APPLIED determines the Availability Zone where application traffic has been shifted away for a resource, and when that traffic shift ends.

Readiness check components

The following diagram illustrates a sample recovery group that is configured to support the readiness check feature. Resources in this example are grouped into cells (by AWS Region) and nested cells (by Availability Zones) in a recovery group. There is an overall readiness status for the recovery group (application), as well as individual readiness statuses for each cell (Region) and nested cell (Availability Zone).


					A sample recovery group for Route 53 ARC. It has two cells, by Region, and 
						 within each Region, there are 2 nested cells, by Availability Zone. The first
						 Region cell has all ready statuses and the second Region cell has a not
						 ready status because one of its zone cells is not ready. The recovery group
						 is overall not ready.

The following are components of the readiness check feature in Route 53 ARC.

Cell

A cell defines your application's replicas or independent units of failover. It groups all the AWS resources that are necessary for your application to run independently within the replica. For example, you might have one set of resources in a primary cell and another set in a standby cell. You determine the boundary of what a cell includes, but cells typically represent an Availability Zone or a Region. You can have multiple cells (nested cells) within a cell, such as AZs within a Region. Each nested cell represents an isolated unit of failover.

Recovery group

Cells are collected into a recovery group. A recovery group represents an application or group of applications that you want to check failover readiness for. It consists of two or more cells, or replicas, that match each other in terms of functionality. For example, if you have a web application that is replicated across us-east-1a and us-east-1b, where us-east-1b is your failover environment, you can represent this application in Route 53 ARC as a recovery group with two cells: one in us-east-1a and one in us-east-1b. A recovery group can also include a global resource, such as a Route 53 health check.

Resources and resource identifiers

When you create components for readiness checks in Route 53 ARC, you specify a resource, such as an Amazon DynamoDB table, a Network Load Balancer, or a DNS target resource, by using a resource identifier. A resource identifier is either the Amazon Resource Name (ARN) for the resource or, for a DNS target resource, the identifier that Route 53 ARC generates when it creates the resource.

DNS target resource

A DNS target resource is the combination of your application's domain name and other DNS information, such as the AWS resource that the domain points to. Including an AWS resource is optional but if you provide it, it must be a Route 53 resource record or a Network Load Balancer. When you provide the AWS resource, you can get more detailed architectural recommendations that can help you improve your application's recovery resiliency. You can create resource sets in Route 53 ARC for DNS target resources, and then create a readiness check for the resource set so that you can get architecture recommendations for your application. The readiness check also monitors the DNS routing policy for your application, based on the readiness rules for DNS target resources.

Resource set

A resource set is a set of resources, including AWS resources or DNS target resources, that span multiple cells. For example, you might have a load balancer in us-east-1a and another one in us-east-1b. To monitor the recovery readiness of the load balancers, you can create a resource set that includes both load balancers, and then create a readiness check for the resource set. Route 53 ARC will continually check the readiness of the resources in the set. You can also add a readiness scope to associate resources in a resource set with the recovery group that you create for your application.

Readiness rule

Readiness rules are audits that Route 53 ARC performs against a set of resources in a resource set. Route 53 ARC has a set of readiness rules for each type of resource that it supports readiness checks for. Each rule includes an ID and a description that explains what Route 53 ARC inspects the resources for.

Readiness check

A readiness check monitors a resource set in your application, such as a set of Amazon Aurora instances, that Route 53 ARC is auditing recovery readiness for. Readiness checks can include auditing, for example, capacity configurations, AWS quotas, or routing policies. For example, if you want to audit readiness for your Amazon EC2 Auto Scaling groups across two Availability Zones, you can create a readiness check for a resource set with two resource ARNs, one for each Auto Scaling group. Then, to make sure that each group is scaled equally, Route 53 ARC continually monitors the instance types and the counts in the two groups.

Readiness scope

A readiness scope identifies the grouping of resources that a specific readiness check encompasses. The scope of a readiness check can be a recovery group (that is, global to the whole application) or a cell (that is, a Region or Availability Zone). For a resource that is a global resource for Route 53 ARC, set the readiness scope at to recovery group or global resource level. For example, a Route 53 health check is a global resource in Route 53 ARC because it isn't specific to a Region or Availability Zone.

Routing control components

The following diagram illustrates an example of components that support the routing control feature in Route 53 ARC. The routing controls shown here (grouped into one control panel) let you manage traffic to two Availability Zones in each of two Regions. When you update routing control states, Route 53 ARC changes health checks in Amazon Route 53, which redirect DNS traffic to different cells. Safety rules that you configure for routing controls help avoid fail-open scenarios and other unintentional consequences.


					Components that support routing control in Route 53 ARC

The following are components of the routing control feature in Route 53 ARC.

Cluster

A cluster is a set of five redundant Regional endpoints against which you initiate API calls to update or get routing control states. A cluster includes a default control panel, and you can host multiple control panels and routing controls on one cluster.

Routing controls

A routing control is a simple on/off switch, hosted on a cluster, that you use to control routing of client traffic in and out of cells. When you create a routing control, you add a Route 53 ARC health check in Route 53. This enables you to reroute traffic (using the health checks, configured with DNS records for your applications) when you update the routing control state in Route 53 ARC.

Routing control health check

Routing controls are integrated with health checks in Route 53. The health checks are associated with DNS records that front each application replica, for example, failover records. When you change routing control states, Route 53 ARC updates the corresponding health checks, which redirect traffic—for example, to failover to your standby replica.

Control panel

A control panel groups together a set of related routing controls. You can associate multiple routing controls with one control panel, and then create safety rules for the control panel to ensure that the traffic redirection updates that you make are safe. For example, you can configure a routing control for each of your load balancers in each Availability Zone, and then group them in the same control panel. Then you can add a safety rule (an "assertion rule") that makes sure that at least one zone (represented by a routing control) is active at any one time, to avoid unintended "fail-open" scenarios.

Default control panel

When you create a cluster, Route 53 ARC creates a default control panel. By default, all routing controls that you create on the cluster are added to the default control panel. Or, you can create your own control panels to group related routing controls.

Safety rule

Safety rules are rules that you add to Route 53 ARC to ensure that recovery actions don't accidentally impair your application's availability. For example, you can create a safety rule that creates a routing control that acts as an overall "on/off" switch so that you can enable or disable a set of other routing controls.

Endpoint (cluster endpoint)

Each cluster in Route 53 ARC has five Regional endpoints that you can use for setting and retrieving routing control states. Your process for accessing the endpoints should assume that Route 53 ARC regularly brings the endpoints up and down for maintenance, so you should try each endpoint in succession until you connect to one. You access the endpoints to get the current state of routing controls (On or Off) and to trigger failovers for your applications by changing routing control states.