SEC07-BP01 Identify the data within your workload - AWS Well-Architected Framework (2023-04-10)

SEC07-BP01 Identify the data within your workload

It’s critical to understand the type and classification of data your workload is processing, the associated business processes, where the data is stored, and who is the data owner. You should also have an understanding of the applicable legal and compliance requirements of your workload, and what data controls need to be enforced. Identifying data is the first step in the data classification journey.

Benefits of establishing this best practice:

Data classification allows workload owners to identify locations that store sensitive data and determine how that data should be accessed and shared.

Data classification aims to answer the following questions:

  • What type of data do you have?

    This could be data such as:

    • Intellectual property (IP) such as trade secrets, patents, or contract agreements.

    • Protected health information (PHI) such as medical records that contain medical history information connected to an individual.

    • Personally identifiable information (PII), such as name, address, date of birth, and national ID or registration number.

    • Credit card data, such as the Primary Account Number (PAN), cardholder name, expiration date, and service code number.

    • Where is the sensitive data is stored?

    • Who can access, modify, and delete data?

    • Understanding user permissions is essential in guarding against potential data mishandling.

  • Who can perform create, read, update, and delete (CRUD) operations?

    • Account for potential escalation of privileges by understanding who can manage permissions to the data.

  • What business impact might occur if the data is disclosed unintentionally, altered, or deleted?

    • Understand the risk consequence if data is modified, deleted, or inadvertently disclosed.

By knowing the answers to these questions, you can take the following actions:

  • Decrease sensitive data scope (such as the number of sensitive data locations) and limit access to sensitive data to only approved users.

  • Gain an understanding of different data types so that you can implement appropriate data protection mechanisms and techniques, such as encryption, data loss prevention, and identity and access management.

  • Optimize costs by delivering the right control objectives for the data.

  • Confidently answer questions from regulators and auditors regarding the types and amount of data, and how data of different sensitivities are isolated from each other.

Level of risk exposed if this best practice is not established: High

Implementation guidance

Data classification is the act of identifying the sensitivity of data. It might involve tagging to make the data easily searchable and trackable. Data classification also reduces the duplication of data, which can help reduce storage and backup costs while speeding up the search process.

Use services such as Amazon Macie to automate at scale both the discovery and classification of sensitive data. Other services, such as Amazon EventBridge and AWS Config, can be used to automate remediation for data security issues such as unencrypted Amazon Simple Storage Service (Amazon S3) buckets and Amazon EC2 EBS volumes or untagged data resources. For a complete list of AWS service integrations, see the EventBridge documentation.

Detecting PII in unstructured data such as customer emails, support tickets, product reviews, and social media, is possible by using Amazon Comprehend, which is a natural language processing (NLP) service that uses machine learning (ML) to find insights and relationships like people, places, sentiments, and topics in unstructured text. For a list of AWS services that can assist with data identification, see Common techniques to detect PHI and PII data using AWS services.

Another method that supports data classification and protection is AWS resource tagging. Tagging allows you to assign metadata to your AWS resources that you can use to manage, identify, organize, search for, and filter resources.

In some cases, you might choose to tag entire resources (such as an S3 bucket), especially when a specific workload or service is expected to store processes or transmissions of already known data classification.

Where appropriate, you can tag an S3 bucket instead of individual objects for ease of administration and security maintenance.

Implementation steps

Detect sensitive data within Amazon S3:

  1. Before starting, make sure you have the appropriate permissions to access the Amazon Macie console and API operations. For additional details, see Getting started with Amazon Macie.

  2. Use Amazon Macie to perform automated data discovery when your sensitive data resides in Amazon S3.

    • Use the Getting Started with Amazon Macie guide to configure a repository for sensitive data discovery results and create a discovery job for sensitive data.

    • How to use Amazon Macie to preview sensitive data in S3 buckets.

      By default, Macie analyzes objects by using the set of managed data identifiers that we recommend for automated sensitive data discovery. You can tailor the analysis by configuring Macie to use specific managed data identifiers, custom data identifiers, and allow lists when it performs automated sensitive data discovery for your account or organization. You can adjust the scope of the analysis by excluding specific buckets (for example, S3 buckets that typically store AWS logging data).

  3. To configure and use automated sensitive data discovery, see Performing automated sensitive data discovery with Amazon Macie.

  4. You might also consider Automated Data Discovery for Amazon Macie.

Detect sensitive data within Amazon RDS:

For more information on data discovery in Amazon Relational Database Service (Amazon RDS) databases, see Enabling data classification for Amazon RDS database with Macie.

Detect sensitive data within DynamoDB:

AWS Partner solutions:

  • Consider using our extensive AWS Partner Network. AWS Partners have extensive tools and compliance frameworks that directly integrate with AWS services. Partners can provide you with a tailored governance and compliance solution to help you meet your organizational needs.

  • For customized solutions in data classification, see Data governance in the age of regulation and compliance requirements.

You can automatically enforce the tagging standards that your organization adopts by creating and deploying policies using AWS Organizations. Tag policies let you specify rules that define valid key names and what values are valid for each key. You can choose to monitor only, which gives you an opportunity to evaluate and clean up your existing tags. After your tags are in compliance with your chosen standards, you can turn on enforcement in the tag policies to prevent non-compliant tags from being created. For more details, see Securing resource tags used for authorization using a service control policy in AWS Organizations and the example policy on preventing tags from being modified except by authorized principals.

  • To begin using tag policies in AWS Organizations, it’s strongly recommended that you follow the workflow in Getting started with tag policies before moving on to more advanced tag policies. Understanding the effects of attaching a simple tag policy to a single account before expanding to an entire organizational unit (OU) or organization allows you to see a tag policy’s effects before you enforce compliance with the tag policy. Getting started with tag policies provides links to instructions for more advanced policy-related tasks.

  • Consider evaluating other AWS services and features that support data classification, which are listed in the Data Classification whitepaper.

Resources

Related documents:

Related blogs:

Related videos: