This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.
Data protection
Note
Maintain visibility and control over data, and how it is accessed and used in your organization.
Protecting your data from unintended unauthorized access and potential unauthorized data changes, will help develop and drive the data protection component of your cloud security strategy. As you migrate data to the AWS Cloud, maintain visibility and control over how your data is being accessed and used throughout your organization. Before architecting any system, foundational practices that influence security should be in place. For example, data classification provides a way to categorize organizational data based on levels of sensitivity. Encryption protects that data by way of rendering it unintelligible to unauthorized access. These tools and techniques support such objectives as preventing financial loss, and complying with regulatory obligations.
Organizations will often look to compliance frameworks such as the
National
Institute of Standards and Technology
Start
To start building out a data protection strategy in AWS, focus on the following four elements.
Identify the data within your workloads.
Understand the type and classification of data your workloads are processing. What are the associated business processes, data owners, applicable legal and compliance requirements? Where are your data stored? These answers will help you identify the security controls that must be enforced to secure your environment. This may include classifications to indicate if the data are intended to be:
-
Publicly available,
-
Internal use only, such as customer personally identifiable information (PII), or
-
Intended for more restricted access, such as intellectual property, legally privileged, or marked sensitive.
By carefully managing an appropriate data classification system and each workload's level of protection requirements, you can accurately map the controls and level of access and protection appropriate for your data. For example, public content is available for anyone to access, but internal or sensitive content may be encrypted and stored in a protected manner that requires authorized access to a key for decrypting the content.
Define data protection controls.
Use resource tags to separate AWS accounts per sensitivity and potentially community of interest. Use IAM policies, SCPs, AWS KMS, and AWS CloudHSM to define and implement your requirements for data classification and protection with encryption. For example, if you have a project with S3 buckets that contain highly critical data, or EC2 instances that process confidential data, they can be tagged with a "Project=ABC" tag. Only your immediate team knows what the project code means, and it provides a way to use attribute-based access control. If you are making authorization decisions based on tags, you should make sure that the permissions on the tags are defined appropriately using tag policies in AWS Organizations.
Define data lifecycle management.
Base your defined lifecycle strategy on sensitivity level as well as legal and organization requirements. Include the duration for which you retain data, data destruction processes, data access management, data transformation, and data sharing. When choosing a data classification methodology, balance usability versus access. Always use a defense in depth approach and reduce human access to data and mechanisms for transforming, deleting, or copying data. For example, require users to strongly authenticate access to an application. Give the application, rather than the users, the requisite access permissions to perform "action at a distance." In addition, confirm that users come from a trusted network path and require access to the decryption keys. Use tools, such as dashboards and automated reporting, to give users information from the data, rather than giving them direct access to the data.
Automate identification and classification.
Automating the identification and classification of data can help you implement the correct controls. Rather than manual, use automation to reduce the risk of human error and exposure. Amazon Macie uses machine learning to automatically discover, classify, and protect sensitive data in AWS and recognizes sensitive data, such as PII or intellectual property. Macie provides you with dashboards and alerts that give visibility into how this data is being accessed or moved. To help you determine appropriate protection and retention controls, classify your data based on criticality and sensitivity.
Summary
We draw from the four areas that should initially be focused on to develop the foundation in which other aspects of data protection will rely (such as encryption at rest or in transit).
AWS has prescriptive guidance within the Well-Architected Framework and security whitepapers to help organizations explore developing a strategy to support their broader data protection initiatives in AWS.
First, develop a data classification strategy, and discover and identify where and what data resides in your workloads on AWS. Then, tag your data, resources, and assets to allow for proper scoping of security controls. Developing a tagging strategy is equally important as the development of your data classification strategy and subsequent discovery of classified data. Tagging can be an effective scaling mechanism for implementing cloud management and governance strategies. It helps identify, protect, detect, and respond to events in AWS. Tags can simplify ABAC, as well as streamline automation and operations, grouping of resources for enhanced visibility, and provide effective cost management.
Advance
Now that the foundation has been set in AWS, protect resources that have been classified, discovered, and tagged. Think of data protection in AWS as layers, and build out a defense in depth approach to protect your resources. Don't limit your protection to a single control, whether it is for an EC2 instance hosting an application, an EBS volume, or a file containing sensitive information in Amazon S3. The AWS Well-Architected Framework provides similar guidance.
Apply security at all layers
Apply a defense in depth approach with multiple security controls. Apply to all layers (for example, edge of network, VPC, load balancing, every instance and compute service, operating system, application, and code).
Focus your data protection controls on these core areas:
Identity and access management
Embracing least privilege is critical to the protection of data, both from accidental exposure, as well as from potential malicious activities. When crafting IAM policies, use specific Amazon Resource Names (ARNs) and condition statements to limit over exposure to resources. Avoid using wildcard values whenever possible to reduce the overall permissions being granted to AWS entities. Enable AWS IAM Access Analyzer to monitor these access policies and alert you when it detects associate IAM policies not following protocol, or with access to external accounts.
Different controls including access (using least privilege), backups (see Reliability Pillar - AWS Well-Architected Framework), isolation, and versioning can all help protect your data at rest. Access to your data should be audited using detective mechanisms covered earlier in this whitepaper including AWS CloudTrail and service level logs, such as S3 Access Logs. You should inventory what data is publicly accessible, and plan for how you can reduce the amount of data available over time. Amazon S3 Glacier Vault Lock and S3 Object Lock are capabilities providing mandatory access control. This means that once a vault policy is locked with the compliance option, not even the root user can change it until the lock expires. Access should also directly link to the organizations data classification strategy. This embraces the principal of least privilege, as well as proper scoping of business functions.
Resource policies
Many resources in AWS, such as VPC Endpoints
Encryption
A critical component of any data protection strategy is encryption of data at rest, in transit, and in use.
Data at rest represents any data that you persist in non-volatile storage for any duration in your workload. This includes block storage, object storage, databases, archives, IoT devices, and any other storage medium on which data is persisted. Protecting your data at rest reduces the risk of unauthorized access, when encryption and appropriate access controls are implemented.
Data in transit is any data that is sent from one system to another. This includes communication between resources within your workload as well as communication between other services and your end users. By providing the appropriate level of protection for your data in transit, you protect the confidentiality and integrity of your workload's data.
Data in use refers to data that is not simply
being passively stored in a stable destination, such as a central
data
warehouse
Using encryption should be the only way to store sensitive data. AWS KMS integrates seamlessly with many AWS services to make it easier for you to encrypt all your data at rest. For example, in Amazon S3 you can set default encryption on a bucket so that all new objects are automatically encrypted. Additionally, Amazon EBS and Amazon S3 support the enforcement of encryption by setting default encryption. Use AWS Config Managed Rules to automatically check that you are using encryption, for EBS volumes, Amazon RDS instances, and S3 buckets.
Key management
To support the encryption components listed preceding, usage of key management services such as AWS KMS is recommended. The KMS keys that you create are customer managed keys. AWS services that use KMS keys to encrypt your service resources often create keys for you. KMS keys that AWS services create in your AWS account are AWS managed keys. KMS keys that AWS services create in a service account are AWS owned keys. Generate keys for specific applications, workloads, services, and environments of use AWS KMS Customer Managed Key (CMK). There should be different keys for Human Resources applications and Finance applications, and for Amazon S3 and Amazon RDS. And you should have different keys for development environments and production environments. Fine-grained access controls can be applied to keys to limit scope and any potential impact. More information about AWS KMS and best practices can be located in the Security Best Practices for AWS Key Management Service,
Logging, monitoring, and alerting
To confirm that the data
protection controls are working as expected, it is important to set up detective controls, and
then responsive controls as your maturity in cloud increases. Use automated tools to validate
and enforce data at rest controls continuously. For example, verify that there are only
encrypted storage resources. You can automate validation that all EBS volumes are
encrypted using AWS Config Rules. AWS Security Hub
Audit the use of encryption keys to validate that the access control mechanisms on the keys are appropriately implemented. For example, any AWS service using an AWS KMS key logs each use in AWS CloudTrail. You can then query AWS CloudTrail and make sure that all uses of your keys are valid, by using a tool such as Amazon CloudWatch Insights.
Amazon GuardDuty automatically detects suspicious activity or
attempts to move data outside of defined boundaries. For example,
GuardDuty can detect S3 read activity that is unusual with
the Exfiltration:S3/ObjectRead.Unusual finding.
In addition to Amazon GuardDuty, Amazon VPC Flow Logs, which capture network traffic
information, can be used with Amazon EventBridge to trigger
detection of abnormal connections, both successful and
denied. S3
Access Analyzer
Summary
Putting all of this guidance together, a real-world example would be a principal (human or machine) who must upload data to S3. This data will then need to be pulled down and processed by an application. As part of their IAM role policy, the principal would have access to Amazon S3. Through constraints in the policy, they would have access only to specific S3 buckets, with the ability to upload. This principal also has permissions to use a Customer Managed Key (CMK) in KMS to encrypt the data in Amazon S3. The CMK itself has a resource policy attached, permitting the principal to only invoke the encrypt API. This same resource policy also defines the application IAM role and permits the decrypt API to be invoked as part of the download and processing by the application. The S3 bucket has a resource policy attached to only allow uploads from defined principals, and download from defined applications IAM roles. It also has restrictions to only select VPCs inside of the AWS account that is part of a defined AWS Organization. The bucket policy also only allows uploads if a specific KMS key is defined as part of the PUT API.
In the preceding example, if one control fails, the data is not exposed, and in many cases, if two controls fail, the data is still protected. This might happen if the S3 bucket was accidentally made public and the principal's account credentials were compromised. This is an example of a defense in depth approach.
Excel
Excel
At this stage of cloud maturity, data protection best practices
include enhancing the foundation that was initially set, iterating
on the capabilities built upon that foundation, moving towards
shift left culture in
DevOps
Advanced permissions management
Protecting data depends on the
permission structure that you have built. While basic IAM provisions are an essential part of
the foundation for all data protection strategies, advanced topics of IAM should be
implemented when possible. ABAC is an advanced permissions management model that scopes all
permissions by the tags of resources and entities. For example, an IAM role with the tag
Project X
should only have access to resources also tagged with Project
X
. More advanced AWS services can be used to further analyze and refine AWS
permissions within your environment to adhere to the principle of least privilege. If you are
using AWS Organizations, SCPs prevent sensitive actions from being taken on your most critical resources
unless stringent conditions are met. IAM Access Analyzer is a tool that can identify all external
access into your AWS environment. It has policy generation and validation capabilities to help
you further scope your AWS IAM policies. It uses machine learning (ML) to understand the
minimum permissions needed by your everyday workloads. These tools should be leveraged to
automatically redefine IAM policies on principals based
on access needs observed over time.
Advanced encryption
Beyond the concepts of encryption of data
at rest and in transit, advanced encryption should focus on using specific ciphers and elliptic
curves for varying data protection needs. You should consider quantum cryptography protections,
tokenization, format preserving encryption (FPE), and masking. These types of encryption
concepts are meant to solve specific industry and regulatory needs in addition to enhancement of
encryption protections. For example, use Federal
Information Processing Standard
Key management
Another domain of data protection to further develop along with your organizations' cloud maturity is key management. Using AWS KMS, you can enforce automated key rotation schedules, confirm that resources are encrypted with distinct keys, define key policies to restrict which principals have admin permissions (delete, create keys) and use (encrypt, decrypt), and more. To meet other use cases, or for customers who want to control the generation of key material, AWS CloudHSM offers a managed hardware security module solution that allows you to generate and use your own keys within a dedicated hardware space on AWS. AWS CloudHSM can be connected to AWS KMS to create a Custom Key Store where you can generate and manage keys inside of a FIPS 140-2 L3 boundary.
Replication, backups, and recovery
A robust data protection strategy should have measures in place to prevent unintended data deletion. Confirm that replication, backup, and recovery mechanisms are implemented so your data is resilient against unforeseen incidents of data loss. Carefully consider replication capabilities across AWS services that may handle your data and determine the best replication needs for your workloads. AWS Backup is a service that allows you to centrally manage and automate backups across AWS services and hybrid workloads. Implement automated recovery plans for your resources in the event of unexpected failure so your workloads experience minimal downtime.
Dynamic data classification
As you expand on your data classification
Data usage monitoring
As you start to acquire data in varying services such as Amazon EBS, Amazon EFS, Amazon S3, data lakes, data warehouses, it is increasingly important to monitor this data and its usage. Build mechanisms that allow for reporting, analytics, alerts, and automated remediation across data types and locations. Understand how, where, and by who your data is being used, to identify norms.
Summary
As part of advanced data protection operations in AWS, you will want to use all of the guidance outlined in this whitepaper. Add guidance and best practices that are applicable to your organizations regulatory and compliance needs. It is important that you create a defense in depth approach to data protection in AWS, and implement preventative, detective, and responsive mechanisms. Use mechanisms such as automated rescoping of IAM permissions based off of observed usage patterns, and auto enforcement of data classification and tagging. You can automate enforcement of encrypted resources, alerting on abnormal data usage, and backup and recovery as well. Mine logs and analytics to identify new patterns and iterate on your data protection policy. Automate wherever possible, and push into the development pipeline to enforce data protection and security early on. Finally, ascertain that your mechanisms remain agile, so when new services and features are released, you are able to explore in development and securely promote to production.