Readiness check in Amazon Route 53 Application Recovery Controller - Amazon Route 53 Application Recovery Controller

Readiness check in Amazon Route 53 Application Recovery Controller

This chapter explains how to model your application in Amazon Route 53 Application Recovery Controller by creating a recovery group and cells, and then how to add readiness checks and readiness scopes so that Route 53 ARC can audit readiness for your application.

After you create readiness checks, you can monitor the readiness status of your resources. Readiness checks help you to ensure that your standby application replica and its resources match your production replica on an ongoing basis, reflecting the capacity, routing policies, and other configuration details of your production application. If it doesn't, you can add capacity or change a configuration so that the replicas are aligned again.

Important

Readiness checks are most useful for verifying, on an ongoing basis, that application replica configurations and runtime states are aligned. Readiness checks shouldn't be used to indicate whether your production replica is healthy, nor should you rely on readiness checks as a primary trigger for failover during a disaster event.

A readiness check in Route 53 ARC continually (at one-minute intervals) audits for mismatches in AWS provisioned capacity, service quotas, throttle limits, and configuration and version discrepancies for the resources included in the check. Readiness checks can notify you of these differences so that you can make sure that each replica has the same configuration setup and the same runtime state. Although readiness checks ensure that your configured capacities across replicas are consistent, you should not expect them to decide on your behalf what the capacity of your replica should be. For example, you should understand your application requirements so that you size your Auto Scaling groups with enough buffer capacity in each replica to manage if another cell is unavailable.

For quotas, when Route 53 ARC detects a mismatch with a readiness check, it can take steps to align the quotas for the replicas by increasing the lower quota to match the higher quota. When the quotas match, the readiness check status shows READY. (Note that this isn't an immediate update process, and the total time depends on the specific resource type and other factors.)

The first step is setting up readiness checks to create a recovery group that represents your application. Each recovery group includes cells for each individual failure-containment unit or replica of your application. Next, you create resource sets for each resource type in your application, and associate readiness checks with the resource sets. Finally, you associate the resources with readiness scopes, so you can get readiness status about the resources in a recovery group (your application) or individual cells (replicas, which are Regions or Availability Zones (AZs)).

Readiness (that is, READY or NOT READY) is based on the resources that are in the scope of the readiness check and the set of rules for a resource type. There are sets of readiness rules for each resource type, which Route 53 ARC checks use to audit resources for readiness. Whether a resource is READY or not is based on how each readiness rule is defined. All readiness rules evaluate resources, but some compare resources to each other and some look at specific information about each resource in the resource set.

By adding readiness checks, you can monitor readiness status, in one of several ways: with EventBridge, in the AWS Management Console, or by using Route 53 ARC API actions. You can also monitor readiness status of resources in different contexts, including the readiness of cells and the readiness of your application. Use the cross-account authorization feature in Route 53 ARC to make it easier to set up and monitor distributed resources from a single AWS account.

Readiness checks and disaster recovery scenarios

Route 53 ARC readiness checks give you insights into whether your applications and resources are ready for recovery by helping you make sure that your applications are scaled to handle failover traffic. Readiness check statuses should not be used as a signal to indicate that a production replica is healthy. You can, however, use readiness checks as a supplement to your application and infrastructure monitoring or health checker systems to determine whether to fail away from or to a replica.

In an urgent situation or an outage, use a combination of health checks and other information to determine that your standby is scaled up, healthy, and ready for you to fail over production traffic. For example, check to see if canaries that run against your standby cell are meeting your success criteria, in addition to verifying that readiness check statuses for the standby are READY.

Be aware that Route 53 ARC readiness checks are hosted in a single AWS Region, US West (Oregon), and during an outage or disaster, readiness check information could become stale or the checks could become unavailable. For more information, see Data and control planes for Route 53 ARC.

How readiness rules determine readiness status

Route 53 ARC readiness checks determine readiness status based on the predefined rules for each resource type and the way those rules are defined. Route 53 ARC includes one group of rules for each type of resource that it supports. For example, Route 53 ARC has groups of readiness rules for Amazon Aurora clusters, Auto Scaling groups, and so on. Some readiness rules compare resources in a set to each other, and some look at specific information about each resource in the resource set.

You can't add, edit, or remove readiness rules, or groups of rules. However, you can create an Amazon CloudWatch alarm and create a readiness check to monitor the state of the alarm. For example, you can create a custom CloudWatch alarm to monitor Amazon EKS container services, and create a readiness check to audit the readiness status of the alarm.

You can view all the readiness rules for each resource type in the AWS Management Console when you create a resource set, or you can view the readiness rules later by navigating to the details page for a resource set. You can also view readiness rules in the following section: Readiness rules in Route 53 ARC.

When a readiness check audits a set of resources with a set of rules, the way each rule is defined determines whether the result will be READY or NOT READY for all the resources or if the result will be different for different resources. In addition, you can view readiness status in multiple ways. For example, you can view the readiness status of a group of resources in a resource set or view a summary of readiness status for a recovery group or a cell (that is, an AWS Region or Availability Zone, depending on how you've set up your recovery group).

The wording in each rule description explains how it evaluates the resources to determine the readiness status when that rule is applied. A rule is defined to inspect each resource or to inspect all resources in a resource set to determine readiness. Specifically, the rules work as follows:

  • The rule inspects each resource in the resource set to ensure a condition.

    • If all resources succeed, all resources are set as READY.

    • If one resource fails, that resource is set as NOT READY, and the other cells remain READY.

    For example: MskClusterState: Inspects each Amazon MSK cluster to ensure that it is in an ACTIVE state.

  • The rule inspects all resources in the resource set to ensure a condition.

    • If the condition is ensured, all resources are set as READY.

    • If any fails to meet the condition, all resources are set as NOT READY.

    For example: VpcSubnetCount: Inspects all VPC subnets to ensure that they have the same number of subnets.

  • Non-critical rule: The rule inspects all resources in the resource set to ensure a condition.

    • If any fails, the readiness status is unchanged. A rule with this behavior has a note in its description.

    For example: ElbV2CheckAzCount: Inspects each Network Load Balancer to ensure that it is attached to only one Availability Zone. Note: This rule does not affect readiness status.

In addition, Route 53 ARC takes an extra step for quotas. If a readiness check detects a mismatch across cells for service quotas (the maximum value for resource creation and operations) for any supported resource, Route 53 ARC automatically raises the quota for the resource with the lower quota. This applies only to quotas (limits). For capacity, you should add additional capacity as required for your application needs.

You can also set up an Amazon EventBridge notification for readiness checks, for example, when any readiness check status changes to NOT READY. Then when a configuration mismatch is detected, EventBridge sends you a notification and you can take corrective action to make sure that your application replicas are aligned and prepared for recovery. For more information, see Using Route 53 ARC with Amazon EventBridge.

DNS target resource readiness checks: Auditing resiliency readiness

With DNS target resource readiness checks in Route 53 ARC, you can audit the architectural and resiliency readiness of your application. This type of readiness check continually scans your application's architecture and Amazon Route 53 routing policies to audit for cross-zone and cross-Region dependencies.

A recovery-oriented application has multiple replicas that are siloed into Availability Zones or AWS Regions, so that the replicas can fail independently of one another. If your application needs adjusting to be siloed correctly, Route 53 ARC will suggest changes that you can make, if needed, to update your architecture to help ensure that it's resilient and ready for failover.

Route 53 ARC automatically detects the number and the scope of cells (representing replicas, or failure-containment units) in your application, and whether the cells are siloed by Availability Zone or by Region. Then, Route 53 ARC identifies and provides information to you about the application resources in the cells, to determine if they are correctly siloed to zones or Regions. For example, if you have cells that are scoped to specific zones, readiness checks can monitor if your load balancers and the targets behind them are also siloed to those zones.

With this information, you can determine if there are changes that you need to make to align resources in your cells to the correct zones or Regions.

To get started, you create DNS target resources for your application, and resource sets and readiness checks for them. For more information, see Getting architecture recommendations in Route 53 ARC.