Runbooks - AWS Security Incident Response Guide


When a security anomaly is detected, containing the event and returning to a known good state are important elements of a response plan. As an example, if the anomaly occurred because of a security misconfiguration, the remediation might be as simple as removing the variance through a redeployment of the resources with the proper configuration. To do this, you will need to plan ahead and define your own security response procedures, which are often called runbooks.

A runbook is the documented form of an organization's procedures for conducting a task or series of tasks. This documentation is usually stored either in an internal digital system or on printed paper. You might currently have incident response runbooks, or you might need to create them to be compliant to a security assurance framework. However, when you manually follow written runbooks, you increase the potential that you will make mistakes. Instead, we recommend that you automate all of your repeatable tasks. Automation frees your response team from common tasks, and makes them available for more important tasks, such as correlating events, practicing in simulations, devising new response procedures, performing research, developing new skills, and testing or building new tools. However, before you can decompose the tasks into programmable logic and iterate towards proper automation, you must start by writing a runbook.

Creating Runbooks

To create runbooks for the cloud, we recommend that you first focus on the alerts you currently generate. If you generate an alert, it is important to investigate it. Start by defining the descriptions of the manual processes that you perform. After this, test the processes and iterate on the runbook pattern to improve the core logic of your response. Determine what the exceptions are, and what the alternative resolutions are for those scenarios. For example, in a development environment, you might want to terminate a misconfigured Amazon EC2 instance. However, if the same event occurred in a production environment, instead of terminating the instance, you might stop the instance and verify with stakeholders that critical data will not be lost and whether termination is acceptable.

After you determine the best solution, you can deconstruct the logic into a code-based solution, which can be used as a tool by many responders to automate the response and remove variance or guess-work by your responders. This speeds up the lifecycle of a response. The next goal is to enable this code to be fully automated by being invoked by the alerts or events themselves, rather than them being executed by a human responder.

Getting Started

If you're not sure where to start, consider beginning with the alerts that could be generated by AWS Trusted Advisor, AWS Security Hub's Foundational Security Best Practices, and AWS Config Rules (including the AWS Config Rules Github repository). Then, focus on events generated by services that will describe systems that you are concerned with.

Amazon GuardDuty and Access Analyzer describe many of the domains that an application will use in AWS, which is why they are generally suggested; however, Amazon Inspector and Amazon Macie have specific uses for those that have data and end point concerns. Information about Amazon GuardDuty findings is available in the Amazon GuardDuty User Guide. Access Analyzer findings are available in the Amazon Access Analyzer User Guide. Macie findings are available in the Amazon Macie User Guide. Amazon Inspector findings are available in the Amazon Inspector User Guide. Security Hub gives you the ability to unify those findings into one place and react to them in concert with low latency, which is why it is suggested as a central location for remediation.

All of the above services send notifications through Amazon CloudWatch Events when any change in the findings or alerts occurs, including newly generated alerts and updates to existing alerts. You can set up the Amazon CloudWatch Events rules to trigger AWS Lambda functions to perform an event-driven response. However, the ability to build custom insights and add your own findings from the application domain adds to the weighty reasons to use Security Hub instead. For more information, see the Event-Driven Response section.