Amazon Route 53 Application Recovery Controller components - Amazon Route 53 Application Recovery Controller

Amazon Route 53 Application Recovery Controller components

This section defines the components included in Amazon Route 53 Application Recovery Controller readiness check and routing control features.

Readiness check components

The following diagram illustrates a sample recovery group that is configured to support the readiness check feature. Resources in this example are grouped into cells (by Region) and nested cells (by Availability Zones) in a recovery group. There is an overall readiness status for the recovery group (application, as well as individual readiness statuses for each cell (Region) and nested cell (Availability Zone).


					A sample recovery group for Application Recovery Controller. It has two cells, by Region, and 
						 within each Region, there are 2 nested cells, by Availability Zone. The first
						 Region cell has all ready statuses and the second Region cell has a not
						 ready status because one of its zone cells is not ready. The recovery group
						 is overall not ready.

The following are components of the readiness check feature in Application Recovery Controller.

Cell

A cell defines your application's replicas or independent units of failover. It groups all AWS resources that are necessary for your application to run independently within the replica. You might have a set of resources in your primary cell (or replica) and another set in your standby cell. You determine the boundary of what a cell includes, but it's typically an Availability Zone or a Region. You can have multiple cells (nested cells) within a Region cell, where each nested cell represents an isolated unit of failover.

Recovery group

Cells are collected into a recovery group. A recovery group represents an application or group of applications that you want to check failover readiness for. It typically consists of two or more cells, or replicas, that mirror each other in terms of functionality. For example, if you have a web application that is replicated across us-east-1a and us-east-1b, where us-east-1b is your failover environment, you can represent this application in Application Recovery Controller as a recovery group with two cells: one in us-east-1a and one in us-east-1b. A recovery group can also include a global resource, such as a Route 53 health check.

AWS resource

A resource is one of your AWS resources, such as an Amazon DynamoDB table, that you specify with the Amazon Resource Name (ARN) for the resource.

DNS target resource

A DNS target resource is the combination of your application's domain name and other DNS information. It also includes the AWS resource that the domain points to, such as a Network Load Balancer. You can specify DNS target resources in resource sets, and then run a readiness check or get architecture recommendations.

Resource set

A resource set is a set of resources, including AWS resources or DNS target resources, that span multiple cells. For example, you might have a load balancer in us-east-1a and another one in us-east-1b. To monitor the recovery readiness of the load balancers, you can create a resource set that includes both load balancers, and then create a readiness check for the resource set. Application Recovery Controller will continually run the readiness check for the resources. Then you can associate the resource set with a recovery group that you create for your application.

Readiness rule

A readiness rule is a check that Application Recovery Controller performs against a set of resources in a resource set. Application Recovery Controller has a set of readiness rules for each type of resource that it supports readiness checks for. Each rule includes an ID and a description that explains what Application Recovery Controller inspects the resources for.

Readiness check

A readiness check monitors a resource set in your application, such as a set of Amazon Aurora instances, that Application Recovery Controller is auditing recovery readiness for. Readiness audits can include checking for capacity, configuration, AWS quotas, or routing policies, depending on the resource. For example, if you want to audit readiness for your Amazon EC2 Auto Scaling groups across two Availability Zones, you can create a readiness check for a resource set with two resource ARNs, one for each Auto Scaling group. Then Application Recovery Controller continually monitors the instance types and the counts in the two groups, to make sure that each group is scaled equally.

Readiness scope

Readiness scope is the grouping that a specific readiness status applies to. It can be a recovery group (an application) or a cell (a Region or Availability Zone). When a readiness scope is at the recovery group level, its scope is a global resource for Application Recovery Controller. For example, a Route 53 health check is a global resource because it isn't scoped to a Region or Availability Zone.

Routing control components

The following diagram illustrates an example of the components that support the routing control feature in Application Recovery Controller. The routing controls included here (grouped into one control panel) let you manage traffic to two Availability Zones in each of two Regions. Health checks in Amazon Route 53 integrate with the controls to redirect traffic. Safety rules that you configure help avoid fail-open scenarios or other unintentional consequences.


					Components that support routing control in Application Recovery Controller

The following are components of the routing control feature in Application Recovery Controller.

Cluster

A cluster is a set of five redundant Regional endpoints against which you can execute API calls to update or get the state of routing controls. You can host multiple control panels and routing controls on one cluster.

Routing control

A routing control is a simple on/off switch, hosted on a cluster, that you use to control routing of client traffic in and out of cells. When you create a routing control, you add a health check in Route 53 so that you can direct Amazon Route 53 to reroute traffic when you update the routing control in Application Recovery Controller.

Routing control health check

Routing controls can be integrated with health checks in Route 53. The health checks are associated with DNS failover records that front each application replica. This lets you control traffic to your replicas by updating routing control states.

Control panel

A control panel groups together a set of related routing controls. You can associate multiple routing controls with one control panel, and then create safety rules for the control panel to ensure that the traffic redirection updates that you make are safe. For example, you can configure a routing control for each of your load balancers in each Availability Zone, and then group them in the same control panel. Then you can add a safety rule (an "assertion rule") that makes sure that at least one zone (represented by a routing control) is active at any one time, to avoid unintended "fail-open" scenarios.

Default control panel

When you create a cluster, Application Recovery Controller creates a default control panel. By default, all routing controls that you create on the cluster are added to the default control panel. Or, you can create your own control panels to group related routing controls.

Safety rule

Safety rules are rules that you add to Application Recovery Controller to ensure that recovery actions don't accidentally impair your application's availability. For example, you can create a safety rule that creates a routing control that acts as an overall "on/off" switch so that you can enable or disable a set of other routing controls.

Endpoint (cluster endpoint)

Each cluster in Application Recovery Controller has five Regional endpoints that you can use for setting and retrieving routing control states. Your process for accessing the endpoints should assume that Application Recovery Controller regularly brings the endpoints up and down for maintenance, so you should try each endpoint in succession until you connect to one. You access the endpoints to get the current state of routing controls (On or Off) and to trigger failovers for your applications by changing routing control states.