This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.
Infrastructure protection
Note
Confirm that systems and services within your workload are protected against unintended and unauthorized access and potential vulnerabilities.
Protecting your infrastructure from unintended and unauthorized access and potential vulnerabilities will help you elevate your security posture in the cloud. This includes protecting your AWS network and computing devices to detect, contain, and stop unauthorized users. In traditional data centers, you are responsible for maintaining and protecting all IT infrastructure. But AWS uses a shared responsibility model where both AWS and the organization share in securing the infrastructure. AWS is responsible for protecting the infrastructure that runs all of the services offered in the cloud. Your organization is responsible for the management of the guest operating system (including updates and security patches) and other associated application software and the configurations, such as firewalls. Your responsibility also depends on the services that you select. For a service like EC2, you are responsible for managing the guest operating system (including updates and security patches), any application software or utilities installed by you on the instances, and the configuration of the security groups. But for a managed service like Amazon S3, you are only responsible for managing data, choosing your encryption option, and setting up appropriate permissions.
Traditional data centers use boundary-based protection methods that define the trust boundary. They filter and restrict anything entering the trusted zone by using a stack of security solutions such as firewalls, intrusion detection systems (IDS), and intrusion prevention systems (IPS) to block malicious traffic at the boundary of the network. In contrast, infrastructure security in a cloud environment like AWS, can follow a Zero Trust model where security is applied at multiple layers. With a Zero Trust model, you don't automatically trust anyone within or outside the network. Instead, you enforce fine-grained identity-based authorization rules. This will provide a least privilege model where all the entities are given only necessary permissions to perform their duty. For example, if an unauthorized user was able to access the system using an open port, they couldn't read or modify data from a database because they would not have sufficient permissions. To reduce the risk of unauthorized users causing disruptions to your cloud deployments or gaining access to your data, use defense in depth with a Zero Trust model.
Start
We recommend using
AWS Organizations in your
landing
zone environment that supports your
multi-account
strategy. Deploy
security
controls uniformly across your environment, and define the
foundational shared services that will be used. Include network
services to centrally deploy and manage your virtual private
clouds (VPC) and subnets, and share them with other accounts in
the environment using
AWS Resource Access Manager (AWS RAM). The shared services
design should also include an account or accounts designated to
deploy centrally managed security services. These can be delegated
admin consoles for AWS Config, Amazon GuardDuty,
Amazon Macie
Next, define your infrastructure segmentation design. This includes your VPC and subnet design patterns with associated classless inter-Domain routing (CIDR) IP address ranges, routing tables, and gateways (network address translation, internet, and transit gateways). Design your infrastructure segmentation to support an n-tiered application architecture. Enforce the separation of the data tier from the business logic tier, from the presentation tier, and from other tiers as appropriate. This segmentation design should also support elements of a Zero Trust strategy with a default deny mechanism. Network communications should be restricted to only what is needed for applications to function appropriately to enable business operations. This may require an analysis of application data flows and architectural design to understand the requirements for infrastructure segmentation to be designed properly.
Now that you understand your shared services and segmentation needs, design your routing tables, network access control lists (NACLs), and security groups to support and enforce them. Use route tables to connect your subnets as needed for application data flows to other subnets and their gateways. Use network ACLs to control access to your subnets. And use security groups to control traffic to EC2 instances or other network interfaces in your subnets.
Routing tables are a set of rules used to determine where traffic flows. This could be needed when a network communication request is destined for a network interface that is not located in the same subnet. The design of the routing tables should enable only the necessary network communications throughout the environment that are required for normal operations in alignment with the segmentation design requirements.
Network ACLs act as a virtual firewall for associated subnets, controlling both inbound and outbound traffic at the subnet level. They are stateless, which means that return traffic will be denied by default, and both allow and deny rules are supported. VPCs, when created, have a default network access control list that, by default, allows all inbound and outbound traffic. All custom network access control lists deny any traffic by default. Subnets must be associated with a network access control list or they will be associated with the default one. Consider setting up default and custom network ACLs associated with subnets in alignment with the infrastructure segmentation design to restrict network communications. This network access control list design may be very similar to the security group design that is implemented.
Security groups are like virtual firewalls that control the traffic flowing in and out of resources that are associated with them, such as an EC2 instance. Security groups are stateful, so they allow return traffic, and they support allow rules only. The rules to allow traffic within the security groups should be designed so that they restrict network communications in alignment with the segmentation requirements. A security group design strategy should consist of creating them for applications that have similar functions and security requirements and use security group "chaining" where applicable. Chaining is the concept of only allowing traffic from an EC2 instance that has a particular security group applied, which allows more flexibility and control in a dynamic environment. You can have up to five security groups applied to a single network interface. It's a good practice to apply security groups in layers to create more generic security groups for reusability, and then layer additional more specific security groups for fine-grained control. The process for managing the lifecycle of security groups must be created to enable business operations (which typically includes DevOps) and maintain proper security.
When deploying EC2 virtual server instances into VPCs to host services that process or store data for business operations, several components must be in place to protect the environment. The credentials used to connect to these instances must be managed and protected. The guest operating system and software deployed within must be managed so that updates and security patches are applied regularly. This will reduce the chance for exposing vulnerabilities that can be exploited. The base "golden" image used to instantiate instances should be managed, custom built with hardened configurations, include the appropriate agent software, and updated regularly. Lastly, the instances should be ephemeral to help reduce the chance of a persistent threat in the environment.
When deploying EC2 instances, the guest operating system Amazon Machine Image (AMI) should be custom built with hardened configurations using Center for internet Security (CIS) benchmarks or other security hardening standards. This type of image is typically referred to as the "golden" image or base image. This image should be periodically revised to help confirm it is the most up-to-date to reduce vulnerability. Consider building an automated pipeline process and tooling where code is used to build and configure the golden AMI. EC2 instance images should also be built to include the appropriate agent software for logging, monitoring, patching, anti-malware, endpoint detection and response (EDR), IDS, and IPS. Consider using Amazon CloudWatch agent or other monitoring and logging solutions. Some form of anti-malware software agent and enterprise management should be in place. We strongly recommend using an EDR, IDS, and IPS solution as well.
As a part of managing the golden AMI, enforce a lifecycle rule to make sure that running EC2 instances are not using earlier images with potential vulnerabilities. To support this, use ephemeral instances that do not contain any persistent data. These can be re-instantiated (also known as rehydration) at any time without reducing operational effectiveness of the services being hosted on them. This may require storing persistent data in a database, an Amazon Elastic Block Store (Amazon EBS) volume, an Amazon Elastic File Store (Amazon EFS) volume, or perhaps an S3 bucket.
For remote host access use AWS Systems Manager Session Manager, a managed capability. This gives you the ability to log in to your hosts without the need to open inbound ports, maintain bastion hosts, or manage SSH keys. It allows authorized users to access EC2 instances in a way that can be monitored and logged without requiring SSH or RDP gateways, or other infrastructure.
You will eventually need to patch guest operating systems to remediate a vulnerability that has been discovered. Therefore, the appropriate mechanism must be in place to enable the discovery and remediation of these on a regular basis. (See the vulnerability management section of this whitepaper for more detailed information.) Software deployed within the guest operating system will also need to be updated regularly. It may be off-the-shelf software, which could have integrations with the vulnerability management solution, or a self-updating feature that must be configured and run periodically. Keep in mind that each type of operating system and software will have its own unique requirements for how to discover and remediate vulnerabilities. Use EC2 Image Builder to build secure images. This reduces the effort to create and maintain golden images without doing numerous automations.
When using container services, such as Amazon Elastic Kubernetes Service (Amazon EKS) or Elastic Container Service (Amazon ECS), several security controls should be configured. The container images that are built and used, infrastructure (the Fargate or EC2 launch types), containers as they are running, and the networks that the container environments are using, must all be protected. Auditing, logging, and incident forensics must also be configured for observability.
-
Confirm that images used by the containers are scanned for CVEs. Amazon ECR basic image scanning can be used to scan on push, or manually. Enhanced scanning in Amazon ECR can be enabled. It integrates with Amazon Inspector to provide automated scanning of image repositories for both OS and programming language package vulnerabilities.
-
Remove unnecessary packages to reduce attack surface as the images are built. To use a single Dockerfile, images should be built from scratch using a multi-stage build technique. Consider using base OS images that do not have certain components, such as a package manager or shell.
-
All container applications should be run as a non-root user, which can be enforced by linting the Dockerfile. Reduce risk of permission escalation by removing any files in containers that have
setuid
orsetgid
on them. -
Use VPC endpoints for accessing Amazon ECR through a private connection within the VPC. Use endpoint polies to restrict the IAM roles that can access container images.
For infrastructure security container security controls:
-
Always use an operating system that is optimized for running containers, such as Amazon EKS optimized Amazon Linux 2 or Bottlerocket
. Upgrade to the latest AMIs to reduce the chance of vulnerabilities. For EKS, using a managed node group will keep your application running while using a rolling-update deployment method. This will bring up new containers using the latest AMI while shutting down earlier containers. Amazon ECS also supports rolling updates using the minimumHealthyPercent
andmaximumPercent
parameters for blue/green updates. -
Deploy container instances and worker nodes into private subnets. Any publicly accessible container instances and worker nodes should have restricted security groups associated with them.
-
Minimize access to container instances or worker nodes and audit accesses to them. Do not use SSH keys. Use AWS Systems Manager Session Manager to access instances and nodes. It will also audit and log commands that were run on the node. Custom AMIs that include the SSM agent can be deployed when bootstrapping the container instances or worker nodes as they are launched, or they can be run as a daemon set.
-
Periodically audit the configuration of the container cluster to achieve compliant configuration over time. For Amazon EKS, run kube-bench
to continually monitor for compliant configurations of the cluster with Amazon EKS CIS benchmarks .
For runtime security of the running application container:
-
Employ least privilege access to AWS resources. Keep in mind that users may not require IAM permissions to an AWS resource, they may need to be provided access to the cluster API instead. For Amazon EKS, use role-based access in the aws-auth ConfigMap to allow users to assume an IAM role for simpler management of this access. For Amazon ECS, use IAM roles for ECS tasks and IAM Roles for Service Accounts (IRSA) for EKS. Each pod should have a separate role in EKS.
-
Run dynamic scans of the running containers using third-party tools such as sysdig falco
, prisma cloud , or aqua . -
Use Policy-as-Code
to enforce security standards using third-party solutions such as Open Policy Agent. -
For Linux-based containers, consider using tools like
seccomp
to restrict unauthorizedsyscalls
from running.
To protect the network used by the container cluster environment:
-
Use TLS encryption to protect all the data being transmitted to and from any pod or task, and certainly use this for sensitive data transmissions. Use mutual-TLS (mTLS) for communication between services.
-
For Amazon EKS, use security groups for pods to restrict access to other AWS resources, and consider installing the Calico network policy engine
to use Kubernetes network policies to restrict traffic within the cluster. For Amazon ECS, use the awsvpc network mode to attach an elastic network interface and security group to the task. The security group should be configured to restrict access to other AWS resources and other tasks within the cluster.
To enable auditing, logging, and forensics capabilities in the container environment:
-
For Amazon EKS, enable the control plane logs, which include the logs for the Kubernetes API server, the controller manager, the scheduler, and the audit log. For Amazon ECS, configure the containers in your tasks to send log information to Amazon CloudWatch Logs. Consider using the AWS-provided Fluent Bit image with plugins for both Amazon CloudWatch Logs and Amazon Data Firehose.
-
Centrally aggregate all logs and ingest them into a tool, such as a security information and event management (SIEM) solution. Log information can then be correlated quickly to identify indicators of compromise (IoCs) that must be investigated.
-
If pods or tasks appear to be compromised, follow incident response procedures and consider isolating them to prevent further corruption in the environment. Remove or change labels and associate a network policy or security group to isolate pods or tasks. Consider cordoning worker nodes in Amazon EKS. Use instance draining in Amazon ECS to perform forensic analysis on the worker node or Amazon ECS instances (capturing memory, processes, or snapshots).
If you are planning to use a hybrid model with some hosts and services in AWS, and some in your on-premises data center, use AWS Direct Connect or AWS Site-to-Site VPN. If you use Direct Connect, make sure to use the transit encryption options for each service. This will encrypt data in transit that traverses AWS Direct Connect. Use VPN over Direct Connect to provide an IPsec-encrypted private connection that reduces network costs, increases bandwidth throughput, and provides a more consistent network experience than internet-based VPN connections.
Advance
As your cloud environment starts to grow, you may end up with hundreds of accounts and VPCs to manage. The best practices described in this section reflect the security measures that will help you configure and maintain your infrastructure at scale.
With a growing cloud environment, managing VPC peering and hybrid connectivity per VPC may become challenging. You might require additional inbound and outbound controls to support the scale and additional requirements. AWS Transit Gateway is a central hub that connects VPCs and on-premises networks. You can use Transit Gateway routing domains to route traffic between VPCs and your on-premises environment based on your routing requirements. Centralizing your networking using Transit Gateways also allows you to use AWS Network Firewall in a centralized VPC for east-west (VPC-to-VPC) and/or north-south (internet-egress and ingress-on-premises) traffic.
Create a dedicated inspection VPC comprising two subnets in each
Availability
Zone
For network layer inbound and outbound filtering, you can use AWS Network Firewall. With Network Firewall, you can write thousands of rules to filter traffic from the internet, on-premises, between VPCs, or from subnets within VPCs. If you have your VPCs connected via a Transit Gateway, you can use Network Firewall in a centralized way to inspect traffic. Network Firewall provides stateless traffic filtering using 5-tuple rule specification (source IP/port, destination IP/port, and protocol) and stateful traffic filtering with Suricata-compatible IPS rules, domain list rules, and 5-touples. Start with managed rule groups (which are developed and maintained by AWS) and then add custom rules to fill any gaps.
If you want to use the same virtual appliances that you used in your on-premises environment, you can use Gateway Load Balancer (GWLB). With the Gateway Load Balancer, you can use the virtual appliances from independent software vendors (ISVs) and build your firewall policies without the overhead of maintaining them. For outbound domain filtering, you can use a combination of Amazon Route 53 Resolver DNS Firewall and AWS Network Firewall. DNS Firewall can be used to inspect queries that pass through the Route 53 Resolver. Network Firewall can provide FQDN-based domain filtering for both network and application layer.
For running web applications, deploy
AWS Web Application Firewall (AWS WAF) to protect them from
common web exploits and bots that may affect availability,
compromise security, or consume excessive resources.
AWS WAF
If you experience bot activity, use
AWS Bot Control features
For medium and large organizations, we recommend using AWS Shield Advanced for Distributed Denial of Service (DDoS) protection. Shield Advanced provides additional detection and mitigation against large and sophisticated DDoS activities, and provides near real-time visibility into events. This is in addition to the layer 3 and 4 DDoS protection, and the application layer (layer 7) provided by AWS Shield Standard for Amazon EC2, Elastic Load Balancing, Amazon CloudFront, AWS Global Accelerator, and Amazon Route 53 resources. It also provides 24x7 access to the Shield Response Team (SRT), and protection against DDoS-related spikes in your Amazon EC2, ELB, Amazon CloudFront, AWS Global Accelerator, and Amazon Route 53 charges. We also recommend enabling proactive engagement with Shield Advanced so that the SRT team can monitor the status of the resources proactively.
In addition to the IPS (Intrusion Prevention System) capabilities
provided by Network Firewall, AWS supports additional IPS and IDS
capabilities. If you prefer to use an IPS/IDS that you already use
in your on-premises environment, you can use Gateway Load Balancer
or VPC traffic mirroring. With Gateway Load Balancer, you can
deploy virtual appliances from the AWS Partner Network
and AWS Marketplace
AWS Firewall Manager can centrally configure and manage firewall rules across your accounts and applications in AWS Organizations. With AWS Firewall Manager, you can configure VPC security groups and audit any existing VPC security groups for your Amazon EC2, Application Load Balancer, and ENI resources. Write AWS WAF rules and associate them to your Application Load Balancers, Amazon API Gateways, and Amazon CloudFront distributions. Activate AWS Shield Advanced protection for your Application Load Balancers, ELB Classic Load Balancers, Elastic IP addresses, and CloudFront distributions. Distribute AWS Network Firewall rules across multiple accounts, and associate your VPCs with Amazon Route 53 Resolvers DNS Firewall rules using AWS Firewall Manager.
When an organization grows, the number of hosts it must manage also grows significantly.
Manually managing these instances can add additional overhead and introduce error. We
recommend leveraging AWS Systems Manager to automate maintenance and deployment tasks such as running
commands, collecting software inventory, applying OS patches, and so on. For example, use the
Run Command feature of AWS Systems Manager to run chmod og-rwx
/etc/ssh/sshd_config
. This will affirm permissions on
/etc/ssh/sshd_config
are configured as a part of security hardening of the
Amazon Linux 2 operating system in an AMI. Use services like Amazon Inspector or other vulnerability
scanning software to discover vulnerabilities. AWS Systems Manager Patch Manager or other patch
management solution can assist in deploying remediations.
To further reduce security maintenance, you can use managed services, such as Amazon Relational Database Service (Amazon RDS) and AWS Lambda, whenever feasible. Amazon RDS will help you deploy and manage relational databases without the maintenance overhead of hardware provisioning, patch management, or backup. Similarly, you can use Lambda to run code without maintaining servers, so you can focus on the functionality and security of your code without the overhead of managing infrastructure.
Excel
At this stage of maturity, best practices include enhancing architectures that have already been set to achieve Zero Trust, while fully managing infrastructure as code. Use automation to avoid human errors and embed security into DevOps. Whether you have dedicated teams managing infrastructure security, or have chosen a distributed DevOps model, it is important to have a RACI matrix that helps identify task or activity ownership while clearly defining roles and responsibilities.
Use
AWS CloudFormation,
Terraform
To simplify the configuration of AWS WAF rules, create and deploy
a solution that will automatically deploy a web access control
list (web ACL). The list should include AWS WAF rules with
protective features that control which traffic to allow to web
applications and APIs deployed on Amazon CloudFront, Application Load Balancers, or Amazon API Gateways. And it will filter out
web-based attacks. Design and deploy automation for AWS Firewall Manager to simplify the configuration, management, and auditing of
firewall, AWS WAF rules, VPC security groups, and Route 53 DNS
firewall rules. Use published solutions such as the
Security
Automations for AWS WAF
In order for security enabled DevOps, the DevOps teams and security teams must find a balance between control and speed; between what are centralized controls and what a workload team can configure. A fast-feedback loop can influence your automation and focus on managing the risks. The DevOps team must be able to iterate quickly; so, the security team should build a faster process for changes in lower environments (such as development or test environments). The security team should maintain the ability to regulate the changes in both the lower and higher environments.
To further enable DevOps, a tagging strategy should be implemented in lower environments only. This permits the rules to be evaluated against the allowlist, but not shut down. An alert should be sent to the development team notifying them when the rule they created is not in compliance. Then they can follow a process to get these rules added to the allowlist in production.