SEC10-BP08 Establish a framework for learning from incidents - AWS Well-Architected Framework

SEC10-BP08 Establish a framework for learning from incidents

Implementing a lessons learned framework and root cause analysis capability can not only help improve incident response capabilities, but also help prevent the incident from recurring. By learning from each incident, you can help avoid repeating the same mistakes, exposures, or misconfigurations, not only improving your security posture, but also minimizing time lost to preventable situations.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

It's important to implement a lessons learned framework that establishes and achieves, at a high level, the following points:

  • When is a lessons learned held?

  • What is involved in the lessons learned process?

  • How is a lessons learned performed?

  • Who is involved in the process and how?

  • How will areas of improvement be identified?

  • How will you ensure improvements are effectively tracked and implemented?

The framework should not focus on or blame individuals, but instead should focus on improving tools and processes.

Implementation steps

Aside from the preceding high-level outcomes listed, it’s important to make sure that you ask the right questions to derive the most value (information that leads to actionable improvements) from the process. Consider these questions to help get you started in fostering your lessons learned discussions:

  • What was the incident?

  • When was the incident first identified?

  • How was it identified?

  • What systems alerted on the activity?

  • What systems, services, and data were involved?

  • What specifically occurred?

  • What worked well?

  • What didn't work well?

  • Which process or procedures failed or failed to scale to respond to the incident?

  • What can be improved within the following areas:

    • People

      • Were the people who were needed to be contacted actually available and was the contact list up to date?

      • Were people missing training or capabilities needed to effectively respond and investigate the incident?

      • Were the appropriate resources ready and available?

    • Process

      • Were processes and procedures followed?

      • Were processes and procedures documented and available for this (type of) incident?

      • Were required processes and procedures missing?

      • Were the responders able to gain timely access to the required information to respond to the issue?

    • Technology

      • Did existing alerting systems effectively identify and alert on the activity?

      • How could we have reduced time-to-detection by 50%?

      • Do existing alerts need improvement or new alerts need to be built for this (type of) incident?

      • Did existing tools allow for effective investigation (search/analysis) of the incident?

      • What can be done to help identify this (type of) incident sooner?

      • What can be done to help prevent this (type of) incident from occurring again?

      • Who owns the improvement plan and how will you test that it has been implemented?

      • What is the timeline for the additional monitoring or preventative controls and processes to be implemented and tested?

This list isn’t all-inclusive, but is intended to serve as a starting point for identifying what the organization and business needs are and how you can analyze them in order to most effectively learn from incidents and continuously improve your security posture. Most important is getting started by incorporating lessons learned as a standard part of your incident response process, documentation, and expectations across the stakeholders.

Resources

Related documents: