Prepare - Security Pillar

Prepare

During an incident, your incident response teams must have access to various tools and the workload resources involved in the incident. Make sure that your teams have appropriate pre-provisioned access to perform their duties before an event occurs. All tools, access, and plans should be documented and tested before an event occurs to make sure that they can provide a timely response.

Identify key personnel and external resources: When you define your approach to incident response in the cloud, in unison with other teams (such as your legal counsel, leadership, business stakeholders, AWS Support Services, and others), you must identify key personnel, stakeholders, and relevant contacts. To reduce dependency and decrease response time, make sure that your team, specialist security teams, and responders are educated about the services that you use and have opportunities to practice hands-on.

We encourage you to identify external AWS security partners that can provide you with outside expertise and a different perspective to augment your response capabilities. Your trusted security partners can help you identify potential risks or threats that you might not be familiar with.

Develop incident management plans: Create plans to help you respond to, communicate during, and recover from an incident. For example, you can start at incident response plan with the most likely scenarios for your workload and organization. Include how you would communicate and escalate both internally and externally. Create incident response plans in the form of playbooks, starting with the most likely scenarios for your workload and organization. These might be events that are currently generated. If you need a starting place, you should look at AWS Trusted Advisor and Amazon GuardDuty findings. Use a simple format such as markdown so it’s easily maintained but ensure that important commands or code snippets are included so they can be executed without having to lookup other documentation.

Start simple and iterate. Work closely with your security experts and partners to identify the tasks required to ensure that the processes are possible. Define the manual descriptions of the processes you perform. After this, test the processes and iterate on the runbook pattern to improve the core logic of your response. Determine what the exceptions are, and what the alternative resolutions are for those scenarios. For example, in a development environment, you might want to terminate a misconfigured Amazon EC2 instance. But, if the same event occurred in a production environment, instead of terminating the instance, quarantine the instance by snapshotting volumes, and capturing memory. In addition, verify with stakeholders that critical data will not be lost, and evidence is collected before shutdown. Include how you would communicate and escalate both internally and externally. When you are comfortable with the manual response to the process, automate it to reduce the time to resolution.

Pre-provision access: Ensure that incident responders have the correct access pre-provisioned into AWS and other relevant systems to reduce the time for investigation through to recovery. Determining how to get access for the right people during an incident delays the time it takes to respond, and can introduce other security weaknesses if access is shared or not properly provisioned while under pressure. You must know what level of access your team members require (for example, what kinds of actions they are likely to take) and you must provision access in advance. Access in the form of roles or users created specifically to respond to a security incident are often privileged in order to provide sufficient access. Therefore, use of these user accounts should be restricted, they should not be used for daily activities, and usage alerted on.

Pre-deploy tools: Ensure that security personnel have the right tools pre-deployed into AWS to reduce the time for investigation through to recovery.

To automate security engineering and operations functions, you can use a comprehensive set of APIs and tools from AWS. You can fully automate identity management, network security, data protection, and monitoring capabilities and deliver them using popular software development methods that you already have in place. When you build security automation, your system can monitor, review, and initiate a response, rather than having people monitor your security position and manually react to events. An effective way to automatically provide searchable and relevant log data across AWS services to your incident responders is to enable Amazon Detective.

If your incident response teams continue to respond to alerts in the same way, they risk alert fatigue. Over time, the team can become desensitized to alerts and can either make mistakes handling ordinary situations or miss unusual alerts. Automation helps avoid alert fatigue by using functions that process the repetitive and ordinary alerts, leaving humans to handle the sensitive and unique incidents. Integrating anomaly detection systems, such as GuardDuty, CloudTrail Insights, and CloudWatch Anomaly Detection, can reduce the burden of common threshold-based alerts.

You can improve manual processes by programmatically automating steps in the process. After you define the remediation pattern to an event, you can decompose that pattern into actionable logic, and write the code to perform that logic. Responders can then execute that code to remediate the issue. Over time, you can automate more and more steps, and ultimately automatically handle whole classes of common incidents.

For tools that execute within the operating system of your EC2 instance, you should evaluate using the AWS Systems Manager Run Command, which enables you to remotely and securely administrate instances using an agent that you install on your Amazon EC2 instance operating system. It requires the AWS Systems Manager Agent (SSM Agent), which is installed by default on many Amazon Machine Images (AMIs). Be aware, though, that once an instance has been compromised, no responses from tools or agents running on it should be considered trustworthy.

Prepare forensic capabilities: It’s important for your incident responders to have an understanding of when and how the forensic investigation fits into your response plan. Your organization should define what evidence is collected and what tools are used in the process. Identify and prepare forensic investigation capabilities that are suitable, including external specialists, tools, and automation. A key decision that you should make upfront is if you will collect data from a “live” system. Some data, such as the contents of volatile memory or active network connections, will be lost if the system is powered off or rebooted.

Your response team can combine tools, such as AWS System Manager, Amazon EventBridge, and AWS Lambda, to automatically run forensic tools within an operating system and VPC traffic mirroring to obtain a network packet capture, to gather non-persistent evidence. Conduct other activities, such as log analysis or analyzing disk images, in a dedicated security account with customized forensic workstations and tools accessible to your responders.

Routinely ship relevant logs to a data store that provides high durability and integrity. Responders should have access to those logs. AWS offers several tools that can make log investigation easier, such as Amazon Athena, Amazon OpenSearch Service (OpenSearch Service), and CloudWatch Log Insights. Additionally, preserve evidence securely using S3 Object Lock. This service follows the WORM (write-once-read-many) model and prevents objects from being deleted or overwritten for a defined period of time. As forensic investigation techniques require specialist training, you might need to engage external specialists.