Fault injection experiments - AWS Resilience Hub

Fault injection experiments

This section describes how to create and run fault injection experiments in AWS Resilience Hub. You run fault injection experiments to measure the resiliency of your AWS resources and the amount of time it takes to recover from application, infrastructure, availability zone, and AWS Region outages.

To measure resiliency, these fault injection experiments simulate outages to your AWS resources. Examples of outages include network unavailable errors, failovers, stopped processes on EC2/ASG, boot recovery in Amazon RDS, and problems with your Availability Zone. When the fault injection experiment concludes, you can determine whether an application can recover from the outage types defined in the RTO in the Resiliency policy.

The experiments in Resilience Hub provide AWS Systems Manager (Systems Manager) automation documents that you use to define what experiments you want Systems Manager to perform. The Systems Manager automation documents:

  • Implement different failure scenarios.

  • Validate alarms when failure happens.

  • Validate that the application can recover when the failure scenario completes.

You can use the Systems Manager automation documents in their default state, or customize them based on your requirements. You can access your experiments Systems Manager documents from either the Application fault injection experiments, or the Application assessment report.

For more information about Systems Manager documents, see Systems Manager document syntax and Systems Manager document automation action reference

In the assessment report, choose a recommendation from a list of Resilience Hub experiments recommendations. Then create a AWS CloudFormation template that you can open, and copy the template path.

Use the path in AWS CloudFormation to create a stack that contains Resilience Hub fault injection experiment templates. After creating the stack, open the application to view provisioned fault injection experiment templates.

A Systems Manager document contains the list of steps that comprise the experiment. Each step should run in the specified orders. You can see how each step ran in an Systems Manager document when you view it in your Systems Manager account.