REL11-BP06 Send notifications when events impact availability - AWS Well-Architected Framework (2023-04-10)

REL11-BP06 Send notifications when events impact availability

Notifications are sent upon the detection of significant events, even if the issue caused by the event was automatically resolved.

Automated healing allows your workload to be reliable. However, it can also obscure underlying problems that need to be addressed. Implement appropriate monitoring and events so that you can detect patterns of problems, including those addressed by auto healing, so that you can resolve root cause issues. Amazon CloudWatch Alarms can be invoked based on failures that occur. They can also be invoked based on automated healing actions that run. CloudWatch Alarms can be configured to send emails, or to log incidents in third-party incident tracking systems using Amazon SNS integration.

Common anti-patterns:

  • Sending alarms that no one acts upon.

  • Performing auto healing automation, but not notifying that healing was needed.

Benefits of establishing this best practice: Notifications of recovery events will ensure that you don’t ignore problems that occur infrequently.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

Resources

Related documents: