Simulate - Security Pillar

Simulate

Run game days: Game days, also known as simulations or exercises, are internal events that provide a structured opportunity to practice your incident management plans and procedures during a realistic scenario. These events should exercise responders using the same tools and techniques that would be used in a real-world scenario - even mimicking real-world environments. Game days are fundamentally about being prepared and iteratively improving your response capabilities. Some of the reasons you might find value in performing game day activities include:

  • Validating readiness

  • Developing confidence – learning from simulations and training staff

  • Following compliance or contractual obligations

  • Generating artifacts for accreditation

  • Being agile – incremental improvement

  • Becoming faster and improving tools

  • Refining communication and escalation

  • Developing comfort with the rare and the unexpected

For these reasons, the value derived from participating in a simulation activity increases an organization's effectiveness during stressful events. Developing a simulation activity that is both realistic and beneficial can be a difficult exercise. Although testing your procedures or automation that handles well-understood events has certain advantages, it is just as valuable to participate in creative SIRS activities to test yourself against the unexpected and continuously improve.

Create custom simulations tailored to your environment, team, and tools. Find an issue and design your simulation around it. This could be something like a leaked credential, a server communicating with unwanted systems, or a misconfiguration that results in unauthorized exposure. Identify engineers who are familiar with your organization to create the scenario and another group to participate. The scenario should be realistic and challenging enough to be valuable. It should include the opportunity to get hands on with logging, notifications, escalations, and executing runbooks or automation. During the simulation, your responders should exercise their technical and organizational skills, and leaders should be involved to build their incident management skills. At the conclusion of the simulation, celebrate the efforts of the team and look for ways to iterate, repeat, and expand into further simulations.

AWS has created Incident Response Runbook templates that you can use not only to prepare your response efforts, but also as a basis for a simulation. When planning, a simulation can be broken into five phases.

Evidence gathering: In this phase, a team will get alerts through various means, such as an internal ticketing system, alerts from monitoring tooling, anonymous tips, or even public news. Teams then start to review infrastructure and application logs to determine the source of the compromise. This step should also involve internal escalations and incident leadership. Once identified, teams move on to containing the incident.

Contain the incident: Teams will have determined there has been an incident and established the source of the compromise. Teams now should take action to contain it, for example, by disabling compromised credentials, isolating a compute resource, or revoking a role’s permission.

Eradicate the incident: Now that they’ve contained the incident, teams will work towards mitigating any vulnerabilities in applications or infrastructure configurations that were susceptible to the compromise. This could include rotating all credentials used for a workload, modifying Access Control Lists (ACLs) or changing network configurations.

Recover from the incident: After teams have implemented best practices, they can now focus on ensuring the complete removal of the compromise. They can do this by restoring files or pieces of data that were compromised during the incident from backups or previous versions. Also, they must ensure that all suspicious activity has ceased and continue to monitor to ensure a stable state.

Post-incident debrief: This session allows teams to share learnings and increase the overall effectiveness of the organization’s incident response plan. Here the teams should review handling of the incident in detail, document lessons learned, update runbooks based on learnings, and determine if new risk assessments are required.