Configuring automated sensitive data discovery for your account - Amazon Macie

Configuring automated sensitive data discovery for your account

With automated sensitive data discovery, Amazon Macie continually selects sample objects from your Amazon Simple Storage Service (Amazon S3) buckets and analyzes the objects to determine whether they contain sensitive data. If you're the Macie administrator for an organization, this includes objects in S3 buckets that your member accounts own. As the analyses progress, Macie updates statistics, inventory data, and other information that it provides about your Amazon S3 data. Macie also produces records of the sensitive data it finds and the analysis that it performs.

To configure and use automated sensitive data discovery, your account must be a standalone Macie account or the Macie administrator account for an organization. If you have a member account and want to perform automated discovery for your S3 buckets, contact the Macie administrator for your organization. For more information, see Managing multiple accounts.

When you enable, configure, or disable automated sensitive data discovery for your account, your changes apply only to the current AWS Region. To make the same changes in additional Regions, repeat the applicable steps in each additional Region.

Before you begin

Amazon Macie creates a sensitive data discovery result for each S3 object that it analyzes while performing automated sensitive data discovery for your account. A sensitive data discovery result is a record that logs details about the analysis that Macie performed on an object. This includes objects that Macie doesn't find sensitive data in, and therefore don't produce sensitive data findings, and objects that Macie can't analyze due to errors or issues such as permissions settings. If Macie does find sensitive data in an object, the sensitive data discovery result includes data from the corresponding finding. It contains additional information too. These results provide you with analysis records that can be helpful for data privacy and protection audits or investigations.

Macie stores your sensitive data discovery results for only 90 days. To access the results and enable long-term storage and retention of them, ensure that you configure Macie to store the results in an S3 bucket. The bucket can serve as a definitive, long-term repository for all of your sensitive data discovery results.

To verify that you've configured this repository for your account, choose Discovery results in the navigation pane on the Amazon Macie console. If you prefer to do this programmatically, you can use the GetClassificationExportConfiguration operation of the Amazon Macie API. To learn how to configure this repository, see Storing and retaining sensitive data discovery results.

If you've configured the repository, Macie creates a folder named automated-sensitive-data-discovery in the repository when automated sensitive data discovery is initially enabled for your account. This folder stores sensitive data discovery results that Macie creates while performing automated discovery for your account.

Enabling automated sensitive data discovery for your account

When you enable automated sensitive data discovery for your account, Amazon Macie begins evaluating your Amazon S3 inventory data and performing other automated discovery activities for your account in the current AWS Region. Depending on the size of your Amazon S3 data estate, sensitive data discovery statistics and other results can begin to appear within 48 hours of enabling automated discovery for your account.

Follow these steps to enable automated sensitive data discovery for your account by using the Amazon Macie console. To enable automated discovery programmatically, use the UpdateAutomatedDiscoveryConfiguration operation of the Amazon Macie API.

To enable automated sensitive data discovery for your account
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to enable automated sensitive data discovery.

  3. In the navigation pane, under Settings, choose Automated discovery.

  4. In the Status section, choose Enable.

  5. When prompted for confirmation, choose Enable.

After you enable automated sensitive data discovery, review and configure your settings to refine the analyses that Macie subsequently performs.

Configuring automated sensitive data discovery settings for your account

If automated sensitive data discovery is enabled for your account, you can adjust the automated discovery settings for your account to refine the analyses that Amazon Macie performs. These settings specify which S3 buckets you want to include in the analyses. They also specify which types and occurrences of sensitive data you want Macie to detect and report—the managed data identifiers, custom data identifiers, and allow lists to use when analyzing S3 objects.

By default, Macie performs automated sensitive data discovery for all the S3 buckets that it monitors and analyzes for your account. If you're the Macie administrator for an organization, this includes S3 buckets that your member accounts own. You can exclude specific buckets from the analyses. For example, you might exclude buckets that typically store AWS logging data, such as AWS CloudTrail event logs. If you exclude a bucket, you can subsequently include it again.

In addition, Macie analyzes S3 objects by using only the set of managed data identifiers that we recommend for automated sensitive data discovery. Macie doesn't use custom data identifiers or allow lists that you've defined. To customize the analyses, you can configure Macie to use specific allow lists, custom data identifiers, and managed data identifiers.

The following sections provide additional information about each type of setting and they explain how to change a setting by using the Amazon Macie console. Choose a section to learn more. To review or change the settings programmatically, you can use the following operations of the Amazon Macie API: UpdateClassificationScope, to specify which S3 buckets to include in the analyses, and UpdateSensitivityInspectionTemplate, to specify which allow lists, custom data identifiers, and managed data identifiers to use.

If you change a setting, Macie applies your change when the next evaluation and analysis cycle starts for automated sensitive data discovery, typically within 24 hours.

By default, Macie performs automated sensitive data discovery for all the S3 buckets that it monitors and analyzes for your account. If you're the Macie administrator for an organization, this includes S3 buckets that your member accounts own. To refine the scope, you can exclude as many as 1,000 buckets from the analyses.

If you exclude an S3 bucket, Macie stops analyzing objects in the bucket when it performs automated sensitive data discovery for your account. Existing sensitive data discovery statistics and details for the bucket persist—for example, the bucket's current sensitivity score remains unchanged. After you exclude a bucket, you can subsequently include it again.

To exclude or include specific S3 buckets
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to exclude or include specific S3 buckets in automated discovery analyses.

  3. In the navigation pane, under Settings, choose Automated discovery.

    The Automated sensitive data discovery page appears and displays your current settings. On that page, the S3 buckets section lists S3 buckets that are currently excluded, or it indicates that all buckets are currently included.

  4. In the S3 buckets section, choose Edit.

  5. Do one of the following:

    • To exclude one or more S3 buckets, choose Add buckets to the exclude list. Then, in the S3 buckets table, select the check box for each bucket that you want to exclude. The table lists all the S3 buckets for your account in the current Region.

    • To include one or more S3 buckets that you previously excluded, choose Remove buckets from the exclude list. Then, in the S3 buckets table, select the check box for each bucket that you want to include. The table lists all the buckets that are currently excluded from automated sensitive data discovery.

    To find specific buckets more easily, enter filter criteria in the filter bar above the table. You can also sort the table by bucket name.

  6. When you finish selecting buckets, choose Add or Remove, depending on the option that you chose in the preceding step.

A managed data identifier is a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data—for example, bank account numbers, AWS secret access keys, or passport numbers for a particular country or region. By default, Macie analyzes S3 objects by using the set of managed data identifiers that we recommend for automated sensitive data discovery. To review the list of identifiers included in this set, see Default settings for automated sensitive data discovery.

You can tailor the analyses to focus on specific types of sensitive data: add managed data identifiers for the types of sensitive data that you want Macie to detect and report, and remove managed data identifiers for the types of sensitive data that you don't want Macie to detect and report. If you remove a managed data identifier, your change doesn't affect existing sensitive data discovery statistics and details for your S3 buckets. For example, if you remove the managed data identifier that detects AWS secret access keys and Macie previously detected that type of sensitive data in a bucket, Macie continues to report those detections for the bucket.

Tip

Instead of removing a managed data identifier from subsequent analyses of all S3 buckets, you can exclude that type of detection from the sensitivity score for specific buckets. For more information, see Managing automated sensitive data discovery for individual S3 buckets.

To add or remove managed data identifiers
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to add or remove managed data identifiers from automated discovery analyses.

  3. In the navigation pane, under Settings, choose Automated discovery.

    On the Automated sensitive data discovery page, the Managed data identifiers section displays your current settings, organized into two tabs:

    • Added to default – This tab lists managed data identifiers that you explicitly added. Macie uses these managed data identifiers in addition to the managed data identifiers that are included in the default set and you haven't explicitly removed.

    • Removed from default – This tab lists managed data identifiers that you explicitly removed. Macie doesn't use these managed data identifiers.

  4. In the Managed data identifiers section, choose Edit.

  5. Do any of the following:

    • To add one or more managed data identifiers, choose the Added to default tab. Then, in the table, select the check box for each managed data identifier that you want to add. If a check box is already selected, you've already added that identifier.

    • To remove one or more managed data identifiers, choose the Removed from default tab. Then, in the table, select the check box for each managed data identifier that you want to remove. If a check box is already selected, you've already removed that identifier.

    On each tab, the table displays a list of all the managed data identifiers that Macie currently provides. In the table, each managed data identifier's ID describes the type of sensitive data that the identifier is designed to detect—for example, CREDIT_CARD_SECURITY_CODE for credit card verification codes. To find specific managed data identifiers more easily, enter filter criteria in the filter bar above the table. You can also sort the table by choosing a column heading. For details about each identifier, see Using managed data identifiers.

  6. When you finish, choose Save.

A custom data identifier is a set of criteria that you define to detect sensitive data. The criteria consist of a regular expression (regex) that defines a text pattern to match and, optionally, character sequences and a proximity rule that refine the results. To learn more, see Building custom data identifiers.

By default, Amazon Macie doesn't use custom data identifiers when it performs automated sensitive data discovery. If you want Macie to use specific custom data identifiers, you can add them to the analyses. Macie then uses the custom data identifiers in addition to any managed data identifiers that you also configured Macie to use.

If you add a custom data identifier to the analyses, you can subsequently remove it. Your change won't affect existing sensitive data discovery statistics and details for your S3 buckets. For example, if you remove a custom data identifier that previously produced detections for a bucket, Macie continues to report those detections for the bucket. However, consider excluding that type of detection from the sensitivity score for specific buckets instead of removing the identifier from subsequent analyses of all buckets. For more information, see Managing automated sensitive data discovery for individual S3 buckets.

To add or remove custom data identifiers
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to add or remove custom data identifiers from automated discovery analyses.

  3. In the navigation pane, under Settings, choose Automated discovery.

    The Automated sensitive data discovery page displays your current settings. On that page, the Custom data identifiers section lists custom data identifiers that you've added, or it indicates that you haven't selected any custom data identifiers for automated discovery.

  4. In the Custom data identifiers section, choose Edit.

  5. Do any of the following:

    • To add one or more custom data identifiers, select the check box for each custom data identifier that you want to add. If a check box is already selected, you've already added that identifier.

    • To remove one or more custom data identifiers, clear the check box for each custom data identifier that you want to remove. If a check box is already cleared, Macie doesn't currently use that identifier when performing automated discovery.

    Tip

    To review or test the settings for a custom data identifier before you add or remove it, choose the link icon ( A box with an arrow ) next to the identifier's name. Macie opens a page that displays the identifier's settings.

    You can also use this page to test the identifier with sample data. To do this, enter up to 1,000 characters of text in the Sample data box, and then choose Test. Macie evaluates the sample data by using the identifier, and then reports the number of matches.

  6. When you finish, choose Save.

In Amazon Macie, an allow list defines specific text or a text pattern that you want Macie to ignore when it inspects S3 objects for sensitive data. If text matches an entry or pattern in an allow list, Macie doesn’t report the text, even if the text matches the criteria of a managed data identifier or a custom data identifier. To learn more, see Defining sensitive data exceptions with allow lists.

By default, Macie doesn't use allow lists when it performs automated sensitive data discovery. If you want Macie to use specific allow lists, you can add them to the analyses. If you add an allow list to the analyses, you can subsequently remove it.

To add or remove allow lists
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to add or remove allow lists from automated discovery analyses.

  3. In the navigation pane, under Settings, choose Automated discovery.

    The Automated sensitive data discovery page displays your current settings. On that page, the Allow lists section indicates which allow lists you've added, or it indicates that you haven't selected any allow lists for automated discovery.

  4. In the Allow lists section, choose Edit.

  5. Do any of the following:

    • To add one or more allow lists, select the check box for each allow list that you want to add. If a check box is already selected, you've already added that list.

    • To remove one or more allow lists, clear the check box for each allow list that you want to remove. If a check box is already cleared, Macie doesn't currently use that list when performing automated discovery.

    Tip

    To review the settings for an allow list before you add or remove it, choose the link icon ( A box with an arrow ) next to the list's name. Macie opens a page that displays the list's settings.

  6. When you finish, choose Save.

Disabling automated sensitive data discovery for your account

You can disable automated sensitive data discovery for your account at any time. If you disable automated sensitive data discovery, Macie stops performing all automated discovery activities for your account before the next evaluation and analysis cycle starts, typically within 24 hours. In addition, you lose access to all statistical data, inventory data, and other information that Macie produced and directly provided while performing those activities. For example, your S3 bucket inventory no longer includes sensitivity scores and visualizations, or analyses statistics and details for individual S3 buckets.

You can continue to access sensitive data findings that Macie produced while performing automated discovery for your account. Macie stores your findings for 90 days. In addition, data that you stored or published to other AWS services remains intact and isn't affected, such as sensitive data discovery results in Amazon S3 and finding events in Amazon EventBridge.

If you disable automated sensitive data discovery for your account, you can enable it again. Macie then resumes all automated discovery activities for your account. If you re-enable it within 30 days of disabling it, you regain access to all statistical data, inventory data, and other information that Macie previously produced and directly provided while performing those activities. If you don't re-enable it within 30 days, Macie permanently deletes the statistical data and other information that it previously produced and provided.

Follow these steps to disable automated sensitive data discovery for your account by using the Amazon Macie console. To disable automated discovery programmatically, use the UpdateAutomatedDiscoveryConfiguration operation of the Amazon Macie API.

To disable automated sensitive data discovery for your account
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to disable automated sensitive data discovery.

  3. In the navigation pane, under Settings, choose Automated discovery.

  4. In the Status section, choose Disable.

  5. When prompted for confirmation, choose Disable.