Establish a framework for learning from incidents

Implementing a lessons learned framework and methodology will not only help to improve incident response capabilities, but also help to prevent the incident from recurring. By learning from each incident, you can help to avoid repeating the same mistakes, exposures, or misconfigurations, not only improving your security posture, but also minimizing time lost to preventable situations.

It's important to implement a lessons learned framework that establishes and achieves, at a high level, the following points:

When is a lessons learned held?
What is involved in the lessons learned process?
How is a lessons learned performed?
Who is involved in the process and how?
How will areas of improvement be identified?
How will you ensure the improvements are effectively tracked and implemented?

Aside from these high-level outcomes listed, it is important to make sure that you ask the right questions to derive the most value (information that leads to actionable improvements) from the process. Consider these questions to help get you started in fostering your lessons learned discussions:

What was the incident?
When was the incident first identified?
How was it identified?
What systems alerted on the activity?
What systems, services, and data were involved?
What specifically occurred?
What worked well?
What didn't work well?
Which process or procedures failed or failed to scale to respond to the incident?
What can be improved within the following areas:
- People
  - Were the people who were needed to be contacted actually available and was the contact list up to date?
  - Were people missing training or capabilities needed to effectively respond and investigate the incident?
  - Were the appropriate resources ready and available?
- Process
  - Were processes and procedures followed?
  - Were processes and procedures documented and available for this (type of) incident?
  - Were required processes and procedures missing?
  - Were the responders able to gain timely access to the required information to respond to the issue?
- Technology
  - Did existing alerting systems effectively identify and alert on the activity?
  - Do existing alerts need improvement or new alerts need to be built for this (type of) incident?
  - Did existing tooling allow for effective investigation (search/analysis) of the incident?
What can be done to help identify this (type of) incident sooner?
What can be done to help prevent this (type of) incident from occurring again?
Who owns the improvement plan and how will you test that it has been implemented?
What is the timeline for the additional monitoring/preventative controls/process to be implemented and tested?

This list isn’t all-inclusive; it is intended to serve as a starting point for identifying what the organization and business needs are and how you can analyze them in order to most effectively learn from incidents and continuously improve your security posture. Most important is getting started by incorporating lessons learned as a standard part of your incident response process, documentation, and expectations across the stakeholders.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Post-incident activity

Establish metrics for success