Configuring automated sensitive data discovery - Amazon Macie

Configuring automated sensitive data discovery

With automated sensitive data discovery, Amazon Macie continually selects sample objects from your Amazon Simple Storage Service (Amazon S3) general purpose buckets and analyzes the objects to determine whether they contain sensitive data. If you're the Macie administrator for an organization, by default this includes objects in S3 buckets that your member accounts own. As the analyses progress, Macie updates statistics, inventory data, and other information that it provides about your Amazon S3 data. Macie also produces records of the sensitive data it finds and the analysis that it performs.

To configure and manage automated sensitive data discovery, your account must be the Macie administrator account for an organization or a standalone Macie account. If your account is part of an organization, only the Macie administrator for your organization can enable or disable automated sensitive data discovery for accounts in your organization. In addition, only the Macie administrator can configure automated sensitive data discovery settings for the accounts. If you have a member account and you want Macie to perform automated sensitive data discovery for your S3 buckets, contact your Macie administrator.

When you enable, configure, or disable automated sensitive data discovery, your changes apply only to the current AWS Region. To make the same changes in additional Regions, repeat the applicable steps in each additional Region.

Before you begin

Before you enable or configure automated sensitive data discovery, complete the following tasks to ensure that you have the resources and permissions that you need.

These tasks are optional if you already enabled and configured automated sensitive data discovery and only want to change the settings or disable it.

Configure a repository for sensitive data discovery results

When Amazon Macie performs automated sensitive data discovery, it creates an analysis record for each Amazon Simple Storage Service (Amazon S3) object that it selects for analysis. These records, referred to as sensitive data discovery results, log details about the analysis of individual S3 objects. This includes objects that Macie doesn't find sensitive data in, and objects that Macie can't analyze due to errors or issues such as permissions settings. If Macie finds sensitive data in an object, the sensitive data discovery result includes information about the sensitive data that Macie found. Sensitive data discovery results provide you with analysis records that can be helpful for data privacy and protection audits or investigations.

Macie stores your sensitive data discovery results for only 90 days. To access the results and enable long-term storage and retention of them, configure Macie to store the results in an S3 bucket. The bucket can serve as a definitive, long-term repository for all of your sensitive data discovery results.

To verify that you configured this repository, choose Discovery results in the navigation pane on the Amazon Macie console. If you prefer to do this programmatically, use the GetClassificationExportConfiguration operation of the Amazon Macie API. To learn more about sensitive data discovery results and how to configure this repository, see Storing and retaining sensitive data discovery results.

If you configured the repository, Macie creates a folder named automated-sensitive-data-discovery in the repository when you enable automated sensitive data discovery for the first time. This folder stores sensitive data discovery results that Macie creates while performing automated discovery for your account or organization.

Verify your permissions

To verify your permissions, use AWS Identity and Access Management (IAM) to review the IAM policies that are attached to your IAM identity. Then compare the information in those policies to the following list of actions that you must be allowed to perform:

  • macie2:GetMacieSession

  • macie2:UpdateAutomatedDiscoveryConfiguration

  • macie2:ListClassificationScopes

  • macie2:UpdateClassificationScope

  • macie2:ListSensitivityInspectionTemplates

  • macie2:UpdateSensitivityInspectionTemplate

The first action allows you to access your Amazon Macie account. The second action allows you to enable or disable automated sensitive data discovery for your account or organization. For an organization, it also allows you to automatically enable automated sensitive data discovery for accounts in your organization. The remaining actions allow you to identify and change the configuration settings.

If you plan to use the Amazon Macie console to review or change the configuration settings, also verify that you're allowed to perform the following actions:

  • macie2:GetAutomatedDiscoveryConfiguration

  • macie2:GetClassificationScope

  • macie2:GetSensitivityInspectionTemplate

These actions allow you to retrieve your current configuration settings and the status of automated sensitive data discovery for your account or organization. Permission to perform these actions is optional if you plan to change the configuration settings programmatically.

If you're the Macie administrator for an organization, you also need to be allowed to perform the following actions:

  • macie2:ListAutomatedDiscoveryAccounts

  • macie2:BatchUpdateAutomatedDiscoveryAccounts

The first action allows you to retrieve the status of automated sensitive data discovery for individual accounts in your organization. The second action allows you to enable or disable automated sensitive data discovery for individual accounts in your organization.

If you're not allowed to perform the requisite actions, ask your AWS administrator for assistance.

Configuration options for organizations

If an account is part of an organization that centrally manages multiple Amazon Macie accounts, the Macie administrator for the organization configures and manages automated sensitive data discovery for accounts in the organization. This includes settings that define the scope and nature of the analyses that Macie performs for the accounts. Members can't access these settings for their own accounts.

If you're the Macie administrator for an organization, you can define the scope of the analyses in several ways:

  • Automatically enable automated sensitive data discovery for accounts – When you enable automated sensitive data discovery, you specify whether to enable it automatically for all existing accounts and new member accounts, only for new member accounts, or no accounts. If you enable it automatically for new member accounts, it's enabled for any account that subsequently joins your organization, when the account joins your organization in Macie. If it's enabled for an account, Macie includes S3 buckets that the account owns. If it's disabled for an account, Macie excludes buckets that the account owns.

  • Selectively enable automated sensitive data discovery for accounts – With this option, you enable or disable automated sensitive data discovery for individual accounts on a case-by-case basis. If you enable it for an account, Macie includes S3 buckets that the account owns. If you don't enable it or you disable it for an account, Macie excludes buckets that the account owns.

  • Exclude specific S3 buckets from automated sensitive data discovery – If you enable automated sensitive data discovery for one or more accounts, you can exclude particular S3 buckets that the accounts own. Macie then skips the buckets when it performs automated discovery for your organization. To exclude particular buckets, add them to the bucket exclusion list in the configuration settings for your administrator account. You can exclude as many as 1,000 buckets for your organization.

By default, automated sensitive data discovery is enabled automatically for all new and existing accounts in an organization. In addition, Macie includes all the S3 buckets that the accounts own. If you keep the default settings, Macie performs automated discovery for all the buckets that it monitors and analyzes for your administrator account, which includes all the buckets that your member accounts own.

As a Macie administrator, you also define the nature of the analyses that Macie performs for your organization. You do this by configuring additional settings for your administrator account—the managed data identifiers, custom data identifiers, and allows lists that you want Macie to use when it analyzes S3 objects. Macie uses the settings for your administrator account when it analyzes S3 objects for other accounts in your organization.

Enabling automated sensitive data discovery

When you enable automated sensitive data discovery, Amazon Macie begins evaluating your Amazon S3 inventory data and performing other automated discovery activities for your account in the current AWS Region. If you're the Macie administrator for an organization, by default this includes S3 buckets that your member accounts own. Depending on the size of your Amazon S3 data estate, sensitive data discovery statistics and other results can begin to appear within 48 hours.

To enable automated sensitive data discovery for an account or organization, you can use the Amazon Macie console or the Amazon Macie API. To enable it by using the console, follow these steps. To enable it programmatically, use the following operations of the Amazon Macie API: BatchUpdateAutomatedDiscoveryAccounts, for individual accounts in an organization, or UpdateAutomatedDiscoveryConfiguration, for an organization, a Macie administrator account, or a standalone Macie account.

To enable automated sensitive data discovery
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to enable automated sensitive data discovery.

  3. In the navigation pane, under Settings, choose Automated sensitive data discovery.

  4. If you have a standalone Macie account, choose Enable in the Status section.

  5. If you're the Macie administrator for an organization, choose an option in the Status section to specify the accounts to enable automated sensitive data discovery for:

    • To enable it for all the accounts in your organization, choose Enable. In the dialog box that appears, choose My organization. To also enable it automatically for accounts that subsequently join your organization, select Enable automatically for new accounts. When you finish, choose Enable.

    • To enable it only for particular member accounts, choose Manage accounts. Then, in the table on the Accounts page, select the check box for each account that you want to enable it for. When you finish, choose Enable automated sensitive data discovery on the Actions menu.

    • To enable it only for your Macie administrator account, choose Enable. In the dialog box that appears, choose My account and clear Enable automatically for new accounts. When you finish, choose Enable.

    To subsequently check or change the status of automated sensitive data discovery for individual accounts in your organization, choose Accounts in the navigation pane. On the Accounts page, the Automated sensitive data discovery field in the table indicates the current status of automated discovery for an account. To change the status for an account, select the account, and then use the Actions menu to enable to disable automated discovery for the account.

After you enable automated sensitive data discovery, review and configure your settings to refine the analyses that Macie performs.

Configuring automated sensitive data discovery settings

If you enable automated sensitive data discovery for your account or organization, you can adjust your automated discovery settings to refine the analyses that Amazon Macie performs. These settings specify S3 buckets to exclude from analyses. They also specify the types and occurrences of sensitive data to detect and report—the managed data identifiers, custom data identifiers, and allow lists to use when analyzing S3 objects.

By default, Macie performs automated sensitive data discovery for all the S3 general purpose buckets that it monitors and analyzes for your account. If you're the Macie administrator for an organization, this includes buckets that your member accounts own. You can exclude specific buckets from the analyses. For example, you might exclude buckets that typically store AWS logging data, such as AWS CloudTrail event logs. If you exclude a bucket, you can subsequently include it again.

In addition, Macie analyzes S3 objects by using only the set of managed data identifiers that we recommend for automated sensitive data discovery. Macie doesn't use custom data identifiers or allow lists that you've defined. To customize the analyses, you can configure Macie to use specific managed data identifiers, custom data identifiers, and allow lists.

The following sections provide additional information about each type of setting. They also explain how to change a setting by using the Amazon Macie console. Choose a section to learn more. To review or change the settings programmatically, you can use the following operations of the Amazon Macie API: UpdateClassificationScope, to specify S3 buckets to exclude from analyses, and UpdateSensitivityInspectionTemplate, to specify which managed data identifiers, custom data identifiers, and allow lists to use.

If you change a setting, Macie applies your change when the next evaluation and analysis cycle starts for automated sensitive data discovery, typically within 24 hours.

By default, Macie performs automated sensitive data discovery for all the S3 general purpose buckets that it monitors and analyzes for your account. If you're the Macie administrator for an organization, this includes buckets that your member accounts own.

To refine the scope, you can exclude as many as 1,000 S3 buckets from the analyses. If you exclude a bucket, Macie stops selecting and analyzing objects in the bucket when it performs automated sensitive data discovery. Existing sensitive data discovery statistics and details for the bucket persist—for example, the bucket's current sensitivity score remains unchanged. After you exclude a bucket, you can subsequently include it again.

To exclude or include specific S3 buckets
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to exclude or include specific S3 buckets in automated discovery analyses.

  3. In the navigation pane, under Settings, choose Automated sensitive data discovery.

    The Automated sensitive data discovery page appears and displays your current settings. On that page, the S3 buckets section lists S3 buckets that are currently excluded, or it indicates that all buckets are currently included.

  4. In the S3 buckets section, choose Edit.

  5. Do one of the following:

    • To exclude one or more S3 buckets, choose Add buckets to the exclude list. Then, in the S3 buckets table, select the check box for each bucket that you want to exclude. The table lists all the general purpose buckets for your account or organization in the current Region.

    • To include one or more S3 buckets that you previously excluded, choose Remove buckets from the exclude list. Then, in the S3 buckets table, select the check box for each bucket that you want to include. The table lists all the buckets that are currently excluded from automated discovery analyses.

    To find specific buckets more easily, enter search criteria in the search box above the table. You can also sort the table by choosing a column heading.

  6. When you finish selecting buckets, choose Add or Remove, depending on the option that you chose in the preceding step.

A managed data identifier is a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data—for example, credit card numbers, AWS secret access keys, or passport numbers for a particular country or region. By default, Macie analyzes S3 objects by using the set of managed data identifiers that we recommend for automated sensitive data discovery. To review a list of these identifiers, see Default settings for automated sensitive data discovery.

You can tailor the analyses to focus on specific types of sensitive data:

  • Add managed data identifiers for the types of sensitive data that you want Macie to detect and report, and

  • Remove managed data identifiers for the types of sensitive data that you don't want Macie to detect and report.

If you remove a managed data identifier, your change doesn't affect existing sensitive data discovery statistics and details for S3 buckets. For example, if you remove the managed data identifier for AWS secret access keys and Macie previously detected that type of data in a bucket, Macie continues to report those detections for the bucket.

Tip

Instead of removing a managed data identifier, which affects subsequent analyses of all S3 buckets, you can exclude its detections from sensitivity scores for particular buckets. For more information, see Managing automated sensitive data discovery for individual S3 buckets.

To add or remove managed data identifiers
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to add or remove managed data identifiers from automated discovery analyses.

  3. In the navigation pane, under Settings, choose Automated sensitive data discovery.

    The Automated sensitive data discovery page appears and displays your current settings. On that page, the Managed data identifiers section displays your current settings, organized into two tabs:

    • Added to default – This tab lists managed data identifiers that you added. Macie uses these identifiers in addition to the ones that are in the default set and you haven't removed.

    • Removed from default – This tab lists managed data identifiers that you removed. Macie doesn't use these identifiers.

  4. In the Managed data identifiers section, choose Edit.

  5. Do any of the following:

    • To add one or more managed data identifiers, choose the Added to default tab. Then, in the table, select the check box for each managed data identifier to add. If a check box is already selected, you already added that identifier.

    • To remove one or more managed data identifiers, choose the Removed from default tab. Then, in the table, select the check box for each managed data identifier to remove. If a check box is already selected, you already removed that identifier.

    On each tab, the table displays a list of all the managed data identifiers that Macie currently provides. In the table, the first column specifies each managed data identifier's ID. The ID describes the type of sensitive data that an identifier is designed to detect—for example, USA_PASSPORT_NUMBER for US passport numbers. To find specific managed data identifiers more easily, enter search criteria in the search box above the table. You can also sort the table by choosing a column heading. For details about each identifier, see Using managed data identifiers.

  6. When you finish, choose Save.

A custom data identifier is a set of criteria that you define to detect sensitive data. The criteria consist of a regular expression (regex) that defines a text pattern to match and, optionally, character sequences and a proximity rule that refine the results. To learn more, see Building custom data identifiers.

By default, Amazon Macie doesn't use custom data identifiers when it performs automated sensitive data discovery. If you want Macie to use specific custom data identifiers, you can add them to the analyses. Macie then uses the custom data identifiers in addition to any managed data identifiers that you configure Macie to use.

If you add a custom data identifier, you can subsequently remove it. Your change doesn't affect existing sensitive data discovery statistics and details for S3 buckets. That is to say, if you remove a custom data identifier that previously produced detections for a bucket, Macie continues to report those detections for the bucket. However, instead of removing the identifier, which affects subsequent analyses of all buckets, consider excluding its detections from sensitivity scores for only particular buckets. For more information, see Managing automated sensitive data discovery for individual S3 buckets.

To add or remove custom data identifiers
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to add or remove custom data identifiers from automated discovery analyses.

  3. In the navigation pane, under Settings, choose Automated sensitive data discovery.

    The Automated sensitive data discovery page appears and displays your current settings. On that page, the Custom data identifiers section lists custom data identifiers that you added, or it indicates that you haven't selected any custom data identifiers.

  4. In the Custom data identifiers section, choose Edit.

  5. Do any of the following:

    • To add one or more custom data identifiers, select the check box for each custom data identifier to add. If a check box is already selected, you already added that identifier.

    • To remove one or more custom data identifiers, clear the check box for each custom data identifier to remove. If a check box is already cleared, Macie doesn't currently use that identifier.

    Tip

    To review or test the settings for a custom data identifier before you add or remove it, choose the link icon ( A box with an arrow ) next to the identifier's name. Macie opens a page that displays the identifier's settings. To also test the identifier with sample data, enter up to 1,000 characters of text in the Sample data box on that page. Then choose Test. Macie evaluates the sample data and reports the number of matches.

  6. When you finish, choose Save.

In Amazon Macie, an allow list defines specific text or a text pattern that you want Macie to ignore when it inspects S3 objects for sensitive data. If text matches an entry or pattern in an allow list, Macie doesn’t report the text. This is the case even if the text matches the criteria of a managed or custom data identifier. To learn more, see Defining sensitive data exceptions with allow lists.

By default, Macie doesn't use allow lists when it performs automated sensitive data discovery. If you want Macie to use specific allow lists, you can add them to the analyses. If you add an allow list, you can subsequently remove it.

To add or remove allow lists
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to add or remove allow lists from automated discovery analyses.

  3. In the navigation pane, under Settings, choose Automated sensitive data discovery.

    The Automated sensitive data discovery page appears and displays your current settings. On that page, the Allow lists section specifies allow lists that you already added, or it indicates that you haven't selected any allow lists.

  4. In the Allow lists section, choose Edit.

  5. Do any of the following:

    • To add one or more allow lists, select the check box for each allow list to add. If a check box is already selected, you already added that list.

    • To remove one or more allow lists, clear the check box for each allow list to remove. If a check box is already cleared, Macie doesn't currently use that list.

    Tip

    To review the settings for an allow list before you add or remove it, choose the link icon ( A box with an arrow ) next to the list's name. Macie opens a page that displays the list's settings. If the list specifies a regular expression (regex), you can also use this page to test the regex with sample data. To do this, enter up to 1,000 characters of text in the Sample data box, and then choose Test. Macie evaluates the sample data and reports the number of matches.

  6. When you finish, choose Save.

Disabling automated sensitive data discovery

You can disable automated sensitive data discovery for an account or organization at any time. If you do this, Macie stops performing all automated discovery activities for the account or organization before a subsequent evaluation and analysis cycle starts, typically within 48 hours. Additional effects vary:

  • If you disable it for an account in your organization, you can continue to access to all statistical data, inventory data, and other information that Macie produced and directly provided while performing automated discovery for the account. You can also enable automated discovery for the account again. Macie then resumes all automated discovery activities for the account.

  • If you disable it for your organization or a standalone Macie account, you lose access to all statistical data, inventory data, and other information that Macie produced and directly provided while performing automated discovery for your organization or account. For example, your S3 bucket inventory no longer includes sensitivity visualizations or analyses statistics. You can subsequently enable it again. Macie then resumes all automated discovery activities for your organization or account. If you re-enable it within 30 days, you regain access to all the data and information that Macie previously produced and directly provided while performing automated discovery. If you don't re-enable it within 30 days, Macie permanently deletes this data and information.

You can continue to access sensitive data findings that Macie produced while performing automated sensitive data discovery for your organization or account. Macie stores findings for 90 days. In addition, data that you stored or published to other AWS services remains intact and isn't affected, such as sensitive data discovery results in Amazon S3 and finding events in Amazon EventBridge.

To disable automated sensitive data discovery, you can use the Amazon Macie console or the Amazon Macie API. To disable it by using the console, follow these steps. To disable it programmatically, use the following operations of the Amazon Macie API: BatchUpdateAutomatedDiscoveryAccounts, for individual accounts in an organization, or UpdateAutomatedDiscoveryConfiguration, for an organization, a Macie administrator account, or a standalone Macie account.

To disable automated sensitive data discovery
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you want to disable automated sensitive data discovery.

  3. In the navigation pane, under Settings, choose Automated sensitive data discovery.

  4. If you're the Macie administrator for an organization, choose an option in the Status section to specify the accounts to disable automated sensitive data discovery for:

    • To disable it only for particular member accounts, choose Manage accounts. Then, in the table on the Accounts page, select the check box for each account that you want to disable it for. When you finish, choose Disable automated sensitive data discovery on the Actions menu.

    • To disable it only for your Macie administrator account, choose Disable. In the dialog box that appears, choose My account, and then choose Disable.

    • To disable it for all the accounts in your organization and your organization overall, choose Disable. In the dialog box that appears, choose My organization, and then choose Disable.

  5. If you have a standalone Macie account, choose Disable in the Status section.