Automate incident response and forensics - AWS Prescriptive Guidance

Automate incident response and forensics

Created by Lucas Kauffman (AWS) and Tomek Jakubowski (AWS)

Code repository: aws-automated-incident-response-and-forensics

Environment: Production

Technologies: Security, identity, compliance

AWS services: Amazon EC2; AWS Lambda; Amazon S3; AWS Security Hub; AWS Identity and Access Management

Summary

This pattern deploys a set of processes that use AWS Lambda functions to provide the following:

  • A way to initiate the incident-response process with minimum knowledge

  • Automated, repeatable processes that are aligned with the AWS Security Incident Response Guide

  • Separation of accounts to operate the automation steps, store artifacts, and create forensic environments

The Automated Incident Response and Forensics framework follows a standard digital forensic process consisting of the following phases:

  1. Containment

  2. Acquisition

  3. Examination

  4. Analysis

You can perform investigations on static data (for example, acquired memory or disk images) and on dynamic data that is live but on separated systems.

For more details, see the Additional information section.

Prerequisites and limitations

Prerequisites 

  • Two AWS accounts:

    • Security account, which can be an existing account, but is preferably new

    • Forensics account, preferably new

  • AWS Organizations set up

  • In the Organizations member accounts:

    • The Amazon Elastic Compute Cloud (Amazon EC2) role must have Get and List access to Amazon Simple Storage Service (Amazon S3) and be accessible by AWS Systems Manager. We recommend using the AmazonSSMManagedInstanceCore AWS managed role. Note that this role will automatically be attached to the EC2 instance when incident response is initiated. After the response has finished, AWS Identity and Access Management (IAM) will remove all rights to the instance.

    • Virtual private cloud (VPC) endpoints in the AWS member account and in the Incident Response and Analysis VPCs. Those endpoints are: S3 Gateway, EC2 Messages, SSM, and SSM Messages.

  • AWS Command Line Interface (AWS CLI) installed on the EC2 instances. If the EC2 instances don’t have AWS CLI installed, internet access will be required for the disk snapshot and memory acquisition to work. In this case, the scripts will reach out to the internet to download the AWS CLI installation files and will install them on the instances.

Limitations 

  • This framework does not intend to generate artifacts that can be considered as electronic evidence, submissible in court.

  • Currently, this pattern supports only Linux based instances running on x86 architecture.

Architecture

Target technology stack

  • AWS CloudFormation

  • AWS CloudTrail

  • AWS Config

  • IAM

  • Lambda

  • Amazon S3

  • AWS Key Management System (AWS KMS)

  • AWS Security Hub

  • Amazon Simple Notification Service (Amazon SNS)

  • AWS Step Functions

Target architecture 

In addition to the member account, the target environment consists of two main accounts: a Security account and a Forensics account. Two accounts are used for the following reasons:

  • To separate them from any other customer accounts to reduce blast radius in case of a failed forensic analysis

  • To help ensure the isolation and protection of the integrity of the artifacts being analyzed

  • To keep the investigation confidential

  • To avoid situations where the threat actors might have used all the resources immediately available to your compromised AWS account by hitting service quotas and so preventing you from instantiating an Amazon EC2 instance to perform investigations. 

Also, having separate Security and Forensics accounts allows for creating separate roles—a Responder for acquiring evidence and an Investigator for analyzing it. Each role would have access to its separate account.

The following diagram shows only the interaction between the accounts. Details of each account are shown in subsequent diagrams, and a complete diagram is attached.

Interaction between member, security, and forensics accounts and users, the internet, and Slack.

The following diagram shows the member account.

Account includes an AWS KMS key, IAM roles, Lambda functions, endpoints, and the Member VPC with two EC2 instances.

1. An event is sent to the Slack Amazon SNS topic.

The following diagram shows the Security account.

Account with the EC2DdCopyInstance in the incident response VPC, plus Step Functions, IAM roles, endpoints, and LiME memory modules.

2. The SNS topic in the Security account initiates Forensics events.

The following diagram shows the Forensics account.

Account with EC2 instance images, IAM roles, S3 buckets for analytics, maintenance, and logging, an Analysis VPC, and a Maintenance VPC, which connects to the internet.

The Security account is where the two main AWS Step Functions workflows are created for memory and disk image acquisition. After the workflows are running, they access the member account that has the EC2 instances involved in an incident, and they initiate a set of Lambda functions that will gather a memory dump or a disk dump. Those artifacts are then stored in the Forensics account.

The Forensics account will hold the artifacts gathered by the Step Functions workflow in the Analysis artifacts S3 bucket. The Forensics account will also have an EC2 Image Builder pipeline that builds an Amazon Machine Image (AMI) of a Forensics instance. Currently, the image is based on SANS SIFT Workstation. 

The build process uses the Maintenance VPC, which has connectivity to the internet. The image can be later used for spinning up the EC2 instance for analysis of the gathered artifacts in the Analysis VPC. 

The Analysis VPC does not have internet connectivity. By default, the pattern creates three private analysis subnets. You can create up to 200 subnets, which is the quota for the number of subnets in a VPC, but the VPC endpoints need to have those subnets added for AWS Systems Manager Sessions Manager to automate running commands in them.

From a best-practices perspective, we recommend using AWS CloudTrail and AWS Config to do the following: 

  • Track changes made in your Forensics account

  • Monitor access and integrity of the artifacts that are stored and analyzed

Workflow

The following diagram shows the key steps of a workflow that includes the process and decision tree from when an instance is compromised until it is analyzed and contained.

  1. Has the SecurityIncidentStatustag been set with the value Analyze? If yes, do the following:

    1. Attach the correct IAM profiles for AWS Systems Manager and Amazon S3.

    2. Send an Amazon SNS message to the Amazon SNS queue in Slack.

    3. Send an Amazon SNS message to the SecurityIncident queue.

    4. Invoke the Memory and Disk Acquisition state machine.

  2. Have memory and disk been acquired? If no, there is an error.

  3. Tag the EC2 instance with the Contain tag.

  4. Attach the IAM role and security group to fully isolate the instance.

Automation and scale

The intent of this pattern is to provide a scalable solution to perform incident response and forensics across several accounts within a single AWS Organizations organization.

Tools

AWS Services

  • AWS CloudFormation helps you set up AWS resources, provision them quickly and consistently, and manage them throughout their lifecycle across AWS accounts and Regions.

  • AWS Command Line Interface (AWS CLI) is an open-source tool for interacting with AWS services through commands in your command-line shell.

  • AWS Identity and Access Management (IAM) helps you securely manage access to your AWS resources by controlling who is authenticated and authorized to use them.

  • AWS Key Management Service (AWS KMS) helps you create and control cryptographic keys to protect your data.

  • AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use.

  • Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

  • AWS Security Hub provides a comprehensive view of your security state in AWS. It also helps you check your AWS environment against security industry standards and best practices.

  • Amazon Simple Notification Service (Amazon SNS) helps you coordinate and manage the exchange of messages between publishers and clients, including web servers and email addresses.

  • AWS Step Functions is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications. 

  • AWS Systems Manager helps you manage your applications and infrastructure running in the AWS Cloud. It simplifies application and resource management, shortens the time to detect and resolve operational problems, and helps you manage your AWS resources securely at scale.

Code 

For the code and specific implementation and usage guidance, see the GitHub Automated Incident Response and Forensics Framework repository.

Epics

TaskDescriptionSkills required
Deploy CloudFormation templates.

The CloudFormation templates are marked 1 through 7 with the first word of the script name indicating in which account the template needs to be deployed. Note that the order of launching the CloudFormation templates is important.

  • 1-forensic-AnalysisVPCnS3Buckets.yaml: Deployed in the forensics account. It creates the S3 buckets and the Analysis VPC, and it activates CloudTrail.

  • 2-forensic-MaintenanceVPCnEC2ImageBuilderPipeline.yaml: Deploys the maintenance VPC and image builder pipeline based on SANS SIFT.

  • 3-security_IR-Disk_Mem_automation.yaml: Deploys the functions in the security account that enable disk and memory acquisition.

  • 4-security_LiME_Volatility_Factory.yaml: Initiates a build function to start creating the memory modules based on the given AMI IDs. Note that AMI IDs are different across AWS Regions. Whenever you need new memory modules, you can rerun this script with the new AMI IDs. Consider integrating this with your golden image AMI builder pipelines (if used in your environment).

  • 5-member-IR-automation.yaml: Creates the member incident-response automation function, which initiates the incident-response process. It allows sharing Amazon Elastic Block Store (Amazon EBS) volumes across accounts, automated posting to Slack channels during the incident-response process, initiating the forensics process, and isolating the instances after the process finishes.

  • 6-forensic-artifact-s3-policies.yaml: After all the scripts have been deployed this script fixes the permissions required for all the cross-account interactions.

  • 7-security-IR-vpc.yaml: Configures a VPC used for incident response volume processing.

To initiate the incident response framework for a specific EC2 instance, create a tag with the key SecurityIncidentStatus and the value Analyze. This will initiate the member Lambda function that will automatically start isolation and memory as well as disk acquisition.

AWS administrator
Operate the framework.

The Lambda function will also retag the asset at the end (or on failure) with Contain. This initiates the containment, which fully isolates the instance with a no INBOUND/OUTBOUND security group and with an IAM role that disallows all access.

Follow the steps in the GitHub repository.

AWS administrator
TaskDescriptionSkills required
Deploy the custom Security Hub actions by using a CloudFormation template.

To create a custom action so that you can use the dropdown list from Security Hub, deploy the Modules/SecurityHub Custom Actions/SecurityHubCustomActions.yaml CloudFormation template. Then modify the IRAutomation role in each of the member accounts to allow the Lambda function that runs the action to assume the IRAutomation role. For more information, see the GitHub repository.

AWS administrator

Related resources

Additional information

By using this environment, a Security Operations Center (SOC) team can improve their security incident response process through the following:

  • Having the ability to perform forensics in a segregated environment to avoid accidental compromise of production resources

  • Having a standardized, repeatable, automated process to do containment and analysis.

  • Giving any account owner or administrator the ability to initiate the incident-response process with the minimal knowledge of how to use tags

  • Having a standardized, clean environment for performing incident analysis and forensics without the noise of a larger environment

  • Having the ability to create multiple analysis environments in parallel

  • Focusing SOC resources on incident response instead of on maintenance and documentation of a cloud forensics environment

  • Moving away from a manual process toward an automated one to achieve scalability

  • Using CloudFormation templates for consistency and to avoid repeatable tasks

Additionally, you avoid using persistent infrastructure, and you pay for resources when you need them.

Attachments

To access additional content that is associated with this document, unzip the following file: attachment.zip