What is AWS Resilience Hub? - AWS Resilience Hub

What is AWS Resilience Hub?

AWS Resilience Hub gives you a central place to define, validate, and track the resiliency of your AWS application. AWS Resilience Hub helps you to protect your applications from disruptions, and reduce recovery costs to optimize business continuity to help meet compliance and regulatory requirements. Use AWS Resilience Hub to do the following:

  • Analyze your infrastructure and get recommendations to improve the resiliency of your applications. In addition to architectural guidance for improving your application resiliency, the recommendations provide code for meeting your resiliency policy, implementing tests, alarms, and standard operating procedures (SOPs) that you can deploy and run with your application in your integration and delivery (CI/CD) pipeline.

  • Evaluate recovery time objective (RTO) and recovery point objective (RPO) targets under different conditions.

  • Optimize business continuity while reducing recovery costs.

  • Identify and resolve issues before they occur in production.

After you deploy an application into production, you can add AWS Resilience Hub to your CI/CD pipeline to validate every build before it is released into production.

AWS Trusted Advisor now inspects and provides resilience score and indications of meeting or breaching an application's resilience policy (RTO/RPO targets). With the resiliency checks from AWS Trusted Advisor, you can see which applications have resiliency risks and address them in AWS Resilience Hub. For more information about working with AWS Trusted Advisor, see AWS Trusted Advisor.

Describe

Describe your applications using AWS CloudFormation with cross-Region and cross-account stacks. Alternatively, use Terraform state files. You can also describe applications by using AWS Resource Groups, or you can choose from applications that are already defined in AWS Service Catalog AppRegistry. In addition, you can also add resources that are located on Amazon Elastic Kubernetes Service (Amazon EKS) clusters as optional resources.

Define

Define the resilience policies for your applications. These policies include RTO and RPO targets for applications, infrastructure, Availability Zone, and Region disruptions.

Assess

The AWS Resilience Hub assessment uses best practices from the AWS Well-Architected Framework to analyze the components of an application and uncover potential resilience weaknesses. These weaknesses can be caused by incomplete infrastructure setup, misconfiguration, or situations where additional configuration improvements are needed.

Validate

After the application and standard operating procedures (SOPs) are updated to incorporate recommendations from the resilience assessment, you can use AWS Resilience Hub to test and verify your application to see if it meets its resilience targets before releasing it into production. AWS Resilience Hub works with AWS Fault Injection Simulator (AWS FIS), a chaos engineering service, to provide fault-injection simulations of real-world failures such as network errors or too many open connections to a database, to validate the application recovers within the resilience targets you defined. AWS Resilience Hub also provides API operations for you to integrate its resilience assessment and testing into your CI/CD pipelines for ongoing resilience validation. Including resilience validation in CI/CD pipelines helps make sure that changes to the application’s underlying infrastructure don't compromise resilience.

View and track

AWS Resilience Hub provides a comprehensive view of your overall application portfolio resilience status through its dashboard. To help you track the resilience of applications, AWS Resilience Hub aggregates and organizes resilience events (such as unavailable database or failed resilience validation), alerts, and insights from services like Amazon CloudWatch, Amazon RouteĀ 53 Application Recovery Controller, and AWS FIS. AWS Resilience Hub also generates a resilience score, a scale that indicates the level of implementation for recommended resilience tests, alarms, and recovery SOPs. This score is used to measure resilience improvements over time.