Readiness check in Amazon Route 53 Application Recovery Controller - Amazon Route 53 Application Recovery Controller

Readiness check in Amazon Route 53 Application Recovery Controller

This chapter explains how to model your application in Amazon Route 53 Application Recovery Controller by creating a recovery group and cells, and then how to add readiness checks and readiness scopes so that Route 53 ARC can audit readiness for your application.

After you create readiness checks, you can monitor the readiness status of your resources and application. You can set up the cross-account authorization feature in Route 53 ARC to make it easier to set up and monitor distributed resources from a single AWS account.

A readiness check in Amazon Route 53 Application Recovery Controller continually (at one-minute intervals) monitors for mismatches in AWS provisioned capacity, service quotas, throttle limits, and configuration and version discrepancies for the resources included in the check. Readiness checks can notify you of issues or even ensure, for example, that the quotas for the load balancers for your application replicas match across AWS Regions. For example, if a developer requests a quota increase for a load balancer in the primary Region, and forgets to do so in the standby Region, Route 53 ARC can detect the mismatch with a readiness check and automatically increase the quota in the standby Region.

The first step to create a recovery group that represents your application. Each recovery group includes cells for each individual failure-containment unit or replica of your application. Next, you create resource sets for each resource type in your application and associate readiness checks with the resource sets. Finally, you associate the resources with readiness scopes, so you can get readiness status about the resources in a recovery group (your application) or individual cells (replicas, which are Regions or AZs).

A Route 53 ARC readiness check determines if a resource set, cell, or recovery group is READY or NOT READY, based on the resources that are in the scope of the readiness check and the rules that apply to that resource type. Readiness status based on how readiness rules are defined. All readiness rules compare resources to each other, but they differ in how they do so and in how readiness check results are determined.

Readiness checks help you to ensure that your standby application replica and its resources match your production replica on an ongoing basis, reflecting the capacity, routing policies, and other configuration details of your production application. If it doesn't, you can add capacity or change a configuration so that the replicas are aligned again.

Tip

You can see a list of the sets of readiness rules for each resource type that readiness checks use to audit resources. For more information, see Readiness rules in Route 53 ARC.

By adding readiness checks, you can monitor readiness status, with EventBridge, in the AWS Management Console, or by using Route 53 ARC API actions. After you create resource sets and set up readiness checks for them, Route 53 ARC continually (once every minute) audits your resources. These audits inspect your resources for readiness in a number of ways, depending on the resource. For example, readiness checks can check to see if provisioned quotas match for resources in a specific readiness scope, and if not, Route 53 ARC can take corrective action by increasing quotas. For more information, see How Amazon Route 53 Application Recovery Controller works .

After setting up the checks, you can monitor the readiness status of resources in different contexts, including the readiness of cells and the readiness of your application.

Readiness checks and disaster recovery scenarios

Route 53 ARC readiness checks give you insights into whether your applications and resources are ready for recovery by helping you make sure your applications are scaled to handle failover traffic and configured to route around failures. However, readiness check statuses should not be used in a disaster recovery scenario as a signal that a standby replica is ready to be failed over to. You should use readiness checks as a supplement to your application and infrastructure monitoring or health checker systems to determine whether to fail away from or to a replica.

In an urgent situation or an outage, use a combination of health checks and other information to determine that your standby is scaled up, healthy, and ready for you to fail over production traffic. For example, check to see if canaries that run against your standby cell are meeting your success criteria, in addition to verifying that readiness check statuses for the standby are READY.

Be aware that Route 53 ARC readiness checks are not highly available. This means that during an outage or disaster, readiness check information could become stale or the checks could become unavailable. For more information, see Data and control planes for Route 53 ARC.

Readiness checks and readiness scopes

Readiness checks always audit groups of resources in resource sets. You create resource sets (separately, or while you're creating a readiness check) to group the resources that are in the cells (Availability Zones or AWS Regions) in your Route 53 ARC recovery group, so that you can define readiness checks. A resource set is typically a group of same-type resources (like Network Load Balancers) but can also be DNS target resources, for architectural readiness checks.

You typically create one resource set and readiness check for each type of resource in your application. For an architectural readiness check, you create a top level DNS target resource and a global (recovery group level) resource set for it, and then create cell level DNS target resources, for a separate resource set.

The following diagram shows an example of a recovery group with three cells (Availability Zones), each with a Network Load Balancer and Auto Scaling group.


					A sample recovery group for Route 53 ARC. It has three cells (AZs), each with one NLB and one EC2 Auto Scaling group.

In this scenario, you would create a resource set and readiness check for the three Network Load Balancers, and a resource set and readiness check for the three Auto Scaling groups. Now you have a readiness check for each set of resources for your recovery group, by resource type.

By creating readiness scopes for resources, you can add readiness check summaries for cells or recovery groups. To specify a readiness scope for a resource, you associate the ARN of the cell or recovery group with each resource in a resource set. You can do this when you're creating a readiness check for a resource set.

For example, when you add a readiness check for a resource set for the Network Load Balancers for this recovery group, you can add readiness scopes to each Network Load Balancer at the same time. In this case, you would associate the ARN of AZ 1a to the Network Load Balancer in AZ 1a, the ARN of AZ 1b to the Network Load Balancer AZ 1b, and the ARN of AZ 1c to the NLB in AZ 1c. When you create a readiness check for the Auto Scaling groups, you would do the same, assigning readiness scopes to each of them when you create the readiness check for the Auto Scaling group resource set.

It’s optional to associate readiness scopes when you create a readiness check, however, we strongly recommend that you set them. Readiness scopes enables Route 53 ARC to show the correct READY or NOT READY readiness status for recovery group summary readiness checks and cell level summary readiness checks. Unless you set readiness scopes, Route 53 ARC can't provide these summaries.

Note that when you add an application-level or global resource, such as a DNS routing policy, you don't choose a recovery group or cell for the readiness scope. Instead, you choose global resource (no cell).

How readiness rules determine readiness status

Route 53 ARC readiness checks determine readiness status based on the predefined rules for each resource type and the way those rules are defined. Route 53 ARC includes one group of rules for each type of resource that it supports. For example, Route 53 ARC has groups of readiness rules for Amazon Aurora clusters, Auto Scaling groups, and so on. Note that at this time, you can't add, edit, or remove readiness rules, or groups of rules.

You can view all the readiness rules for each resource type in the AWS Management Console when you create a resource set, or you can view the readiness rules later by navigating to the details page for a resource set. You can also view readiness rules in the following section: Readiness rules in Route 53 ARC.

When a readiness check monitors a set of resources with a set of rules, the way the rules are defined determines whether the outcome will be READY or NOT READY for all of the resources or if the result will be different for different resources. In addition, you can view readiness status in multiple ways. For example, you can see the readiness status of a group of resources in a resource set or view a summary of readiness for a recovery group or a cell (that is, an AWS Region or Availability Zone, depending on how you've set up your recovery group).

Each rule description includes how it compares the resources in a resource set to each other to determine the readiness status when that rule is applied. A rule is defined to inspect each resource or to inspect all resources in a resource set to determine readiness. Specifically, the rules work as follows:

  • The rule inspects each resource in the resource set to ensure a condition.

    • If all resources succeed, all resources are set as READY.

    • If one resource fails, that resource is set as NOT READY, and the other cells remain READY.

    For example: MskClusterState: Inspects each Amazon MSK cluster to ensure that it is in an ACTIVE state.

  • The rule inspects all resources in the resource set to ensure a condition.

    • If the condition is ensured, all resources are set as READY.

    • If any fails to meet the condition, all resources are set as NOT READY.

    For example: VpcSubnetCount: Inspects all VPC subnets to ensure that they have the same number of subnets.

  • Non-critical rule: The rule inspects all resources in the resource set to ensure a condition.

    • If any fails, the status is not changed. Note: These rules state this behavior in their description.

    For example, the following rule is a non-critical rule:

    ElbV2CheckAzCount: Inspects each Network Load Balancer to ensure that it is attached to only one Availability Zone. Note: This rule does not affect readiness status.

In addition, Route 53 ARC takes an extra step for quotas. If a readiness check detects a mismatch across cells for service quotas (the maximum value for resource creation and operations) for any supported resource, Route 53 ARC automatically raises the quota for the resource with the lower quota. This applies only to quotas (limits). For capacity, you should add additional capacity as required for your application needs.

You can also set up an Amazon EventBridge notification for readiness checks, for example, when any readiness check status changes to NOT READY. Then when a configuration mismatch is detected, EventBridge sends you a notification and you can take corrective action to make sure that your application replicas are aligned and prepared for recovery. For more information, see Using Route 53 ARC with Amazon EventBridge.

DNS target resource readiness checks: Auditing resiliency readiness

With DNS target resource readiness checks in Route 53 ARC, you can audit the architectural and resiliency readiness of your application. This type of readiness check continually scans your application's architecture and Amazon Route 53 routing policies to audit for cross-zone and cross-Region dependencies.

A recovery-oriented application has multiple replicas that are siloed into Availability Zones or AWS Regions, so that the replicas can fail independently of one another. If your application needs adjusting to be siloed correctly, Route 53 ARC will make suggestions about changes that you can make, if needed, to update your architecture to help ensure that it's resilient and ready for failover.

Route 53 ARC automatically detects the number of cells (representing replicas, or failure-containment units) in your application and the scope of the cells, and whether the cells are siloed by Availability Zone or by Region. Then, Route 53 ARC identifies and provides information to you about the application resources in the cells, to determine if they are correctly siloed to zones or Regions. For example, if you have cells that are scoped to specific zones, readiness checks can monitor if your load balancers and the targets behind them are also siloed to those zones.

With this information, you can determine if there are changes that you need to make to align resources in your cells to the correct zones or Regions.

To get started, you create DNS target resources for your application, and resource sets and readiness checks for them. For more information, see Getting architecture recommendations in Route 53 ARC.