Discovering sensitive data with Amazon Macie - Amazon Macie

Discovering sensitive data with Amazon Macie

To discover sensitive data with Amazon Macie, you create and run sensitive data discovery jobs. A sensitive data discovery job analyzes objects in Amazon Simple Storage Service (Amazon S3) buckets to determine whether the objects contain sensitive data, and it provides detailed reports of the sensitive data that it finds and the analysis that it performs. By creating and running sensitive data discovery jobs, you can automate discovery, logging, and reporting of sensitive data in your S3 buckets.

Each sensitive data discovery job automatically uses built-in criteria and techniques, such as machine learning and pattern matching, to analyze objects in S3 buckets. These techniques and criteria, referred to as managed data identifiers, can detect a large and growing list of sensitive data types for many countries and regions, including multiple types of personally identifiable information (PII), personal health information (PHI), and financial data.

You can optionally supplement these managed data identifiers by creating custom data identifiers. A custom data identifier is a set of criteria that you define to detect sensitive data. The criteria consist of a regular expression (regex) that defines a text pattern to match and, optionally, character sequences and a proximity rule that refine the results. If you configure a sensitive data discovery job to use this type of identifier, you can detect sensitive data that reflects your organization's particular scenarios, intellectual property, or proprietary data—for example, employee IDs, customer account numbers, or internal data classifications.

When you create a job, you specify which S3 buckets you want the job to analyze. Macie can analyze data in a bucket if the following is true:

  • The data is stored in a supported file or storage format. For more information, see Supported file and storage formats.

  • If the data is encrypted, it’s encrypted using a key that Macie is allowed to use. For more information, see Analyzing encrypted S3 objects.

  • If the bucket has a restrictive bucket policy, the policy allows Macie to access objects in the bucket. For more information, see Allowing Macie to access S3 buckets and objects.

  • The data is stored directly in Amazon S3 and uses a supported storage class—S3 Intelligent-Tiering, S3 One Zone-IA, S3 Standard, or S3 Standard-IA. Macie can’t analyze data that’s stored in Amazon S3 Glacier or other AWS services.


    Although Macie is optimized for Amazon S3, you can use it to discover sensitive data that you currently store elsewhere. You can do this by moving the data to Amazon S3 temporarily or permanently. For example, export Amazon RDS or Amazon Aurora snapshots to Amazon S3 in Apache Parquet format. Or export an Amazon DynamoDB table to Amazon S3. You can then create a job to analyze the data in Amazon S3.

As you create and configure a job, you choose options to define the schedule and the scope of the job's analysis. You can run a job only once, for on-demand analysis and assessment, or on a recurring basis for periodic analysis, assessment, and monitoring. To define the breadth and depth of a job's analysis, you choose various scope options for the job. These options include custom criteria that derive from properties of S3 buckets and objects, such as tags.

To help you meet and maintain compliance with your data security and privacy requirements, each sensitive data discovery job produces records of the sensitive data that it finds and the analysis that it performs—sensitive data findings and sensitive data discovery results. A sensitive data finding is a detailed report of sensitive data that Macie found in an object. A sensitive data discovery result is a record that logs details about the analysis of an object. Each type of record adheres to a standardized schema, which can help you query, monitor, and process the records by using other applications, services, and systems as necessary. For more information, see Reviewing job statistics and results.