Operations testing - AWS Prescriptive Guidance

Operations testing

Like products, IT operations should be tested, end to end, on a regular cadence. Although enterprise customers have adopted operational testing for activities like disaster recovery, operational testing should be extended to other operations domains, such as incident and event management. Game-day scenarios, like fire drills, are activities that test how your processes, tools, and people react when an operations event occurs. Here are some prescriptive game-day scenarios used to test incident and event management:

  • Amazon Elastic Compute Cloud (Amazon EC2) CPU utilization stress test

  • Amazon EC2 network stress test

  • Amazon EC2 memory stress test

  • Amazon Relational Database Service (Amazon RDS) memory stress test

  • Amazon RDS storage stress

As a best practice, you should test your IT operations starting with incident and event management, and test them in other operational domains, too. As a best practice, you should also have a predetermined game-day schedule. Here are some examples.

Prod or non-prod schedule

              Prod or non-prod game-day schedule

Prod and non-prod schedule

              Prod and non-prod game-day schedule