Response plans - Incident Manager

Response plans

Use response plans to plan for incidents and define how to respond to incidents. Response plans provide a template for when an incident occurs. This template includes information about who to engage, the expected severity of the event, automatic runbooks to initiate, and metrics to monitor.

Best practices

Taking the time to plan for incidents ahead of time saves operational time for teams later down the road. Teams should consider the following best practices when designing a response plan.

  • Streamlined engagement – Identify the most appropriate team for an incident. Engaging wide distribution lists or the wrong teams causes confusion and wastes responder time during incidents.

  • Reliable escalation – Using escalation plans rather than contacts ensures that responders are effectively and reliably engaged. Even with the best intentions, responders are sometimes unreachable. Having a backup responder configured in an escalation plan covers these scenarios.

  • Runbooks – Developing runbooks that provide repeatable and understandable steps helps reduce the stress responders experience during incidents.

  • Collaboration – Use chat channels to streamline communication during incidents. Chat channels help responders stay up to date with information and also share information with other responders.

Create a response plan

Use the following procedure to create a response plan and automate incident response.

Response plan details

  1. Open the Incident Manager console, and in the left navigation, choose Response plans.

  2. Choose Create response plan.

  3. Enter a unique and identifiable response plan Name.

  4. (Optional) Enter a Display name. Use the display name to provide a more user-friendly name to the response plan.

Incident defaults

  1. Enter an incident title. The incident title helps to identify an incident on the incidents home page.

  2. To indicate the potential scope of the incident, choose an Impact.

  3. (Optional) Provide a brief summary of the incident.

  4. (Optional) Provide a dedupe string. Incident Manager uses the dedupe string to prevent the same root cause from creating multiple incidents in the same account. Incident Manager deduplicates Incidents created from the same CloudWatch alarm or EventBridge event into the same incident.

  5. (Optional) Provide tag keys and values to assign to incidents created from this response plan. You must have TagResource permission for the incident record resource to be able to set incident tags within the response plan.

(Optional) Chat channel

  1. Choose a chat channel for the incident responders to interact in during an incident. For more information about chat channels, see Chat channels.

    Important

    Incident Manager must have permissions to publish to the chat channel's SNS topic. Without permissions to publish to the SNS topic, you can't add it to the response plan. Incident Manager verifies permissions by publishing a test notification to the SNS topic.

  2. (Optional) Choose additional SNS topics to publish to during the incident. Adding SNS topics in multiple Regions increases redundancy in case a Region is down at the time of the incident.

(Optional) Engagements

  • For Engagement, choose any number of contacts and escalations plans. For information about contact and escalation plan creation, see Contacts and Escalation plans.

(Optional) Runbook

  1. To select a Runbook:

    • Choose Select an existing runbook. Select the Owner, Runbook, and Version. For information about runbook creation, see Runbooks and automation.

    • Choose Clone runbook from template. Enter a descriptive runbook name.

  2. Either choose an existing role or use the following steps to create a new role. The role must allow the ssm:StartAutomationExecution action for your specific runbook. For the runbook to work across accounts it must also allow the sts:AssumeRole action for the AWS-SystemsManager-AutomationExecutionRole role that you created during Cross-Region and cross-account incident management.

    1. Open the IAM console at https://console.aws.amazon.com/iam/.

    2. Choose Roles from the left navigation and choose Create role.

    3. Choose Incident Manager and choose the Incident Manager use case.

    4. Choose Next: Permissions.

    5. Choose Create policy and then choose the JSON tab.

    6. Copy and paste the following JSON blob describing the policy into the JSON editor. Replace the account number (111122223333) and runbook name (DocumentName) in the runbook's ARN in the following policy example.

      { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Resource": "arn:aws:ssm:*:111122223333:automation-definition/DocumentName:*", "Action": "ssm:StartAutomationExecution" }, { "Effect": "Allow", "Resource": "arn:aws:iam::*:role/AWS-SystemsManager-AutomationExecutionRole", "Action": "sts:AssumeRole" }, { "Effect": "Allow", "Resource": "arn:aws:ssm-incidents:*:*:*", "Action": "ssm-incidents:*" }, { "Effect": "Allow", "Resource": "arn:aws:ssm-contacts:*:*:*", "Action": "ssm-contacts:*" } ] }
    7. Choose Next: Tags and (optional) add tags to your policy.

    8. Choose Next: Review.

    9. Provide a Name and (optional) provide a Description for the policy.

    10. Choose Create policy.

    11. Navigate back to the role you were creating and search for the policy you created. Select the policy.

    12. (Optional) Add tags to your role.

    13. Provide a Role name and (optional) update the Role description.

    14. Choose Create role.

  3. Navigate back to the response plan you are creating and refresh the Role name dropdown.

  4. Select the role you created.

  5. Choose the Execution target.

Add tags and create the response plan

  1. (Optional) Add tags to your response plan.

  2. Choose Create response plan.