SEC07-BP01 Identify the data within your workload
It’s critical to understand the type and classification of data your workload is processing, the associated business processes, where the data is stored, and who is the data owner. You should also have an understanding of the applicable legal and compliance requirements of your workload, and what data controls need to be enforced. Identifying data is the first step in the data classification journey.
Benefits of establishing this best practice:
Data classification allows workload owners to identify locations that store sensitive data and determine how that data should be accessed and shared.
Data classification aims to answer the following questions:
-
What type of data do you have?
This could be data such as:
-
Intellectual property (IP) such as trade secrets, patents, or contract agreements.
-
Protected health information (PHI) such as medical records that contain medical history information connected to an individual.
-
Personally identifiable information (PII), such as name, address, date of birth, and national ID or registration number.
-
Credit card data, such as the Primary Account Number (PAN), cardholder name, expiration date, and service code number.
-
Where is the sensitive data is stored?
-
Who can access, modify, and delete data?
-
Understanding user permissions is essential in guarding against potential data mishandling.
-
-
Who can perform create, read, update, and delete (CRUD) operations?
-
Account for potential escalation of privileges by understanding who can manage permissions to the data.
-
-
What business impact might occur if the data is disclosed unintentionally, altered, or deleted?
-
Understand the risk consequence if data is modified, deleted, or inadvertently disclosed.
-
By knowing the answers to these questions, you can take the following actions:
-
Decrease sensitive data scope (such as the number of sensitive data locations) and limit access to sensitive data to only approved users.
-
Gain an understanding of different data types so that you can implement appropriate data protection mechanisms and techniques, such as encryption, data loss prevention, and identity and access management.
-
Optimize costs by delivering the right control objectives for the data.
-
Confidently answer questions from regulators and auditors regarding the types and amount of data, and how data of different sensitivities are isolated from each other.
Level of risk exposed if this best practice is not established: High
Implementation guidance
Data classification is the act of identifying the sensitivity of data. It might involve tagging to make the data easily searchable and trackable. Data classification also reduces the duplication of data, which can help reduce storage and backup costs while speeding up the search process.
Use services such as Amazon Macie to automate at scale both the discovery and classification of sensitive data. Other services, such as Amazon EventBridge and AWS Config, can be used to automate remediation for data security issues such as unencrypted Amazon Simple Storage Service (Amazon S3) buckets and Amazon EC2 EBS volumes or untagged data resources. For a complete list of AWS service integrations, see the EventBridge documentation.
Detecting
PII in unstructured data such as customer emails, support
tickets, product reviews, and social media, is possible by
using
Amazon Comprehend
Another method that supports data classification and protection is AWS resource tagging. Tagging allows you to assign metadata to your AWS resources that you can use to manage, identify, organize, search for, and filter resources.
In some cases, you might choose to tag entire resources (such as an S3 bucket), especially when a specific workload or service is expected to store processes or transmissions of already known data classification.
Where appropriate, you can tag an S3 bucket instead of individual objects for ease of administration and security maintenance.
Implementation steps
Detect sensitive data within Amazon S3:
-
Before starting, make sure you have the appropriate permissions to access the Amazon Macie console and API operations. For additional details, see Getting started with Amazon Macie.
-
Use Amazon Macie to perform automated data discovery when your sensitive data resides in Amazon S3
. -
Use the Getting Started with Amazon Macie guide to configure a repository for sensitive data discovery results and create a discovery job for sensitive data.
-
How to use Amazon Macie to preview sensitive data in S3 buckets.
By default, Macie analyzes objects by using the set of managed data identifiers that we recommend for automated sensitive data discovery. You can tailor the analysis by configuring Macie to use specific managed data identifiers, custom data identifiers, and allow lists when it performs automated sensitive data discovery for your account or organization. You can adjust the scope of the analysis by excluding specific buckets (for example, S3 buckets that typically store AWS logging data).
-
-
To configure and use automated sensitive data discovery, see Performing automated sensitive data discovery with Amazon Macie.
-
You might also consider Automated Data Discovery for Amazon Macie
.
Detect sensitive data within Amazon RDS:
For more information on data discovery in
Amazon Relational Database Service (Amazon RDS)
Detect sensitive data within DynamoDB:
-
Detecting sensitive data in DynamoDB with Macie
explains how to use Amazon Macie to detect sensitive data in Amazon DynamoDB tables by exporting the data to Amazon S3 for scanning.
AWS Partner solutions:
-
Consider using our extensive AWS Partner Network. AWS Partners have extensive tools and compliance frameworks that directly integrate with AWS services. Partners can provide you with a tailored governance and compliance solution to help you meet your organizational needs.
-
For customized solutions in data classification, see Data governance in the age of regulation and compliance requirements
.
You can automatically enforce the tagging standards that your
organization adopts by creating and deploying policies using AWS Organizations. Tag policies let you specify rules that define
valid key names and what values are valid for each key. You can
choose to monitor only, which gives you an opportunity to
evaluate and clean up your existing tags. After your tags are in
compliance with your chosen standards, you can turn on
enforcement in the tag policies to prevent non-compliant tags
from being created. For more details, see
Securing
resource tags used for authorization using a service control
policy in AWS Organizations
-
To begin using tag policies in AWS Organizations
, it’s strongly recommended that you follow the workflow in Getting started with tag policies before moving on to more advanced tag policies. Understanding the effects of attaching a simple tag policy to a single account before expanding to an entire organizational unit (OU) or organization allows you to see a tag policy’s effects before you enforce compliance with the tag policy. Getting started with tag policies provides links to instructions for more advanced policy-related tasks. -
Consider evaluating other AWS services and features that support data classification, which are listed in the Data Classification whitepaper.
Resources
Related documents:
Related blogs:
-
How to use Amazon Macie to preview sensitive data in S3 buckets.
-
Performing automated sensitive data discovery with Amazon Macie.
-
Common techniques to detect PHI and PII data using AWS Services
-
Securing resource tags used for authorization using a service control policy in AWS Organizations
-
Enabling data classification for Amazon RDS database with Macie
Related videos: