What is AWS Incident Detection and Response? - AWS Incident Detection and Response User Guide

What is AWS Incident Detection and Response?

AWS Incident Detection and Response offers eligible AWS Enterprise Support customers proactive incident engagement to reduce the potential for failure and accelerates recovery of critical workloads from disruption. Incident Detection and Response facilitates your collaboration with AWS to develop runbooks and response plans customized to each onboarded workload. A team of Incident Management Engineers (IMEs) monitor your onboarded workloads 24x7 and engages you on a call bridge within 5 minutes of a critical alarm.

Incident Detection and Response offers the following key features:

  • Improved observability: AWS experts provide guidance to help you define and correlate metrics and alarms between the application and infrastructure layers of your workload to detect disruptions early.

  • 5-minute response time: IMEs monitor your onboarded workloads 24x7 to detect critical incidents. The IMEs respond within 5-minutes of an alarm trigger or in response to a business-critical Support case that you raise to Incident Detection and Response.

  • Faster resolution: IMEs use pre-defined and custom runbooks developed for your workloads to respond within 5-minutes, create a Support case on your behalf, and manage incidents on your workload. IMEs provide single-threaded ownership for incidents and keep you engaged with the right AWS experts until the incident is resolved.

  • Incident management for AWS events: Because we understand the context of your critical workload (for example, accounts, services and instances), we can detect and proactively notify you of a potential impact to your workload during an AWS service event. If requested, IMEs engage you during the AWS service events and provide updates on the events. While Incident Detection and Response cannot prioritize you for recovery during a service event, Incident Detection and Response does provide Support guidance to help you implement your mitigation plan.

  • Reduced potential for failure: After resolution, the IMEs provide you with a post-incident review (upon request). And, AWS experts work with you to apply lessons learned to improve the incident response plan and runbooks. You can also leverage AWS Resilience Hub for continuous resiliency tracking on your workloads.