Storing and retaining sensitive data discovery results with Amazon Macie - Amazon Macie

Storing and retaining sensitive data discovery results with Amazon Macie

Amazon Macie creates a record for each Amazon Simple Storage Service (Amazon S3) object that it analyzes when you run sensitive data discovery jobs or perform automated sensitive data discovery. These records, referred to as sensitive data discovery results, log details about the analysis that Macie performs on individual S3 objects. This includes:

  • Objects that Macie finds sensitive data in, and therefore also produce sensitive data findings.

  • Objects that Macie doesn't find sensitive data in, and therefore don't produce sensitive data findings.

  • Objects that Macie can't analyze due to issues such as use of an unsupported format or permissions settings.

If Macie finds sensitive data in an S3 object, the sensitive data discovery result includes data from the corresponding finding. It provides additional information too, such as the location of as many as 1,000 occurrences of each type of sensitive data that Macie found in the object. A sensitive data discovery result doesn't include the sensitive data that Macie found. Instead, it provides you with an analysis record that can be helpful for data privacy and protection audits or investigations.

Macie stores your sensitive data discovery results for 90 days. To access your results and enable long-term storage and retention of them, configure Macie to store the results in an S3 bucket and encrypt them with an AWS Key Management Service (AWS KMS) key. If you do this, Macie writes your sensitive data discovery results to JSON Lines (.jsonl) files, which it adds to the S3 bucket as GNU Zip (.gz) files. The S3 bucket can then serve as a definitive, long-term repository for all of your sensitive data discovery results.

This topic guides you through the process of using the AWS Management Console to configure this type of repository for your discovery results. The configuration is a combination of an S3 bucket that stores the results, an AWS KMS key that encrypts the results, and Macie settings that indicate which bucket and key to use. If you prefer to configure the Macie settings programmatically, you can use the PutClassificationExportConfiguration operation of the Amazon Macie API.

When you configure the settings in Macie, your choices apply only to the current AWS Region. If you're the Macie administrator for an organization, your choices apply only to your account. They don't apply to any associated member accounts.

If you use Macie in multiple Regions, configure the repository settings for each Region in which you use Macie. If you prefer to store all discovery results for all Regions in one S3 bucket, you can do this by choosing the same bucket, located in one specific Region, for each Region in which you use Macie.

Step 1: Verify your permissions

Before you configure a repository for your sensitive data discovery results, verify that you have the permissions that you need. You can do this by using the AWS Identity and Access Management (IAM) console:

  1. Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/.

  2. In the navigation pane, choose Users.

  3. Choose your user name.

The Permissions tab lists all the IAM policies that are attached to your user name. Choose a policy to show its details. Then compare the information in the policy to the following list of actions that you must be allowed to perform to configure the repository.

Macie

For Macie, verify that you're allowed to perform the following action:

macie2:PutClassificationExportConfiguration

This action allows you to add or change the repository settings in Macie.

Amazon S3

For Amazon S3, verify that you're allowed to perform the following actions:

  • s3:CreateBucket

  • s3:GetBucketLocation

  • s3:ListAllMyBuckets

  • s3:PutBucketAcl

  • s3:PutBucketPolicy

  • s3:PutBucketPublicAccessBlock

  • s3:PutObject

These actions allow you to access and configure an S3 bucket that can serve as the repository.

AWS KMS

To use the Amazon Macie console to add or change the repository settings, also verify that you're allowed to perform the following AWS KMS actions:

  • kms:DescribeKey

  • kms:ListAliases

These actions allow you to retrieve information about AWS KMS keys that can encrypt data in the repository. If you plan to create a new AWS KMS key to encrypt the data, you also need to be allowed to perform the following actions: kms:CreateKey, kms:GetKeyPolicy, and kms:PutKeyPolicy.

If you're not allowed to perform one or more of the preceding actions, ask your AWS administrator for assistance before you proceed to the next step.

Step 2: Choose an AWS KMS key and update the key policy

After you verify your permissions, determine which AWS KMS key you want Macie to use to encrypt your sensitive data discovery results. The key must be a customer managed, symmetric encryption KMS key that's in the same AWS Region as the S3 bucket where you want to store the results.

The key can be an existing KMS key from your own account, or an existing KMS key that another account owns. If you want to use a new KMS key, create the key before proceeding. If you want to use an existing key that another account owns, obtain the Amazon Resource Name (ARN) of the key. You'll need to enter this ARN when you configure the repository settings in Macie. For information about creating and reviewing the settings for KMS keys, see Managing keys in the AWS Key Management Service Developer Guide.

After you determine which KMS key you want Macie to use, give Macie permission to use the key. Otherwise, Macie won't be able to encrypt or store discovery results in the repository. To give Macie permission to use the key, change the key policy for the key. For detailed information about key policies and managing access to KMS keys, see Key policies in AWS KMS in the AWS Key Management Service Developer Guide.

To change the key policy

  1. Open the AWS KMS console at https://console.aws.amazon.com/kms.

  2. To change the AWS Region, use the Region selector in the upper-right corner of the page.

  3. Choose the key that you want to use to encrypt the results.

  4. On the Key policy tab, choose Edit.

  5. Copy the following statement to your clipboard and then add it to the policy:

    { "Sid": "Allow Macie to use the key", "Effect": "Allow", "Principal": { "Service": "macie.amazonaws.com" }, "Action": [ "kms:GenerateDataKey", "kms:Encrypt" ], "Resource": "*", "Condition": { "StringEquals": { "aws:SourceAccount": "111122223333" }, "ArnLike": { "aws:SourceArn": [ "arn:aws:macie2:Region:111122223333:export-configuration:*", "arn:aws:macie2:Region:111122223333:classification-job/*" ] } } }

    When you add the statement, make sure that the syntax is valid. Policies use JSON format. This means that you need to also add a comma before or after the statement, depending on where you add the statement to the policy. If you add the statement as the last statement, add a comma after the closing curly brace for the preceding statement. If you add it as the first statement or between two existing statements, add a comma after the closing curly brace for the statement.

  6. Update the statement with the correct values for your environment:

    • In the Condition fields, replace the placeholder values, where:

      • 111122223333 is the account ID for your AWS account.

      • Region is the AWS Region in which you're using Macie and you want to allow Macie to use the key.

        If you use Macie in multiple Regions and want to allow Macie to use the key in additional Regions, add aws:SourceArn conditions for each additional Region. For example:

        "aws:SourceArn": [ "arn:aws:macie2:us-east-1:111122223333:export-configuration:*", "arn:aws:macie2:us-east-1:111122223333:classification-job/*", "arn:aws:macie2:us-west-2:111122223333:export-configuration:*", "arn:aws:macie2:us-west-2:111122223333:classification-job/*" ]

        Alternatively, you can allow Macie to use the key in all Regions. To do this, replace the placeholder value with the wildcard character (*). For example:

        "aws:SourceArn": [ "arn:aws:macie2:*:111122223333:export-configuration:*", "arn:aws:macie2:*:111122223333:classification-job/*" ]
    • If you're using Macie in a manually enabled AWS Region, add the appropriate Region code to the value for the Service field. For example, if you're using Macie in the Middle East (Bahrain) Region, which has the Region code me-south-1, replace macie.amazonaws.com with macie.me-south-1.amazonaws.com.

    Note that the Condition fields use two IAM global condition keys:

    • aws:SourceAccount – This condition allows Macie to perform the specified actions only for your account. More specifically, it determines which account can perform the specified actions for the resources and actions specified by the aws:SourceArn condition.

      To allow Macie to perform the specified actions for additional accounts, add the account ID for each additional account to this condition. For example:

      "aws:SourceAccount": [111122223333,444455556666]
    • aws:SourceArn – This condition prevents other AWS services from performing the specified actions. It also prevents Macie from using the key while performing other actions for your account. In other words, it allows Macie to encrypt S3 objects with the key only if the objects are sensitive data discovery results, and only if those results are for sensitive data discovery jobs that are created by the account and in the Region specified in the condition.

      To allow Macie to perform the specified actions for additional accounts, add ARNs for each additional account to this condition. For example:

      "aws:SourceArn": [ "arn:aws:macie2:us-east-1:111122223333:export-configuration:*", "arn:aws:macie2:us-east-1:111122223333:classification-job/*", "arn:aws:macie2:us-east-1:444455556666:export-configuration:*", "arn:aws:macie2:us-east-1:444455556666:classification-job/*" ]

      The accounts specified by the aws:SourceAccount and aws:SourceArn conditions should match.

    These conditions help prevent Macie from being used as a confused deputy during transactions with AWS KMS. Although we don’t recommend it, you can remove these conditions from the statement.

  7. When you finish adding and updating the statement, choose Save changes.

Step 3: Specify the S3 bucket to use

After you verify your permissions and choose the AWS KMS key to use, you're ready to specify which S3 bucket you want to use as the repository for your sensitive data discovery results. You have two options:

  • Use a new S3 bucket that Macie creates – If you choose this option, Macie automatically creates a new S3 bucket for your discovery results. Macie also applies a bucket policy to the bucket. The policy allows Macie to add (put) objects to the bucket. To review this policy, choose View policy on the Amazon Macie console after you enter a name for the bucket.

  • Use an existing S3 bucket that you create – If you prefer to store your discovery results in a particular S3 bucket that you create, create the bucket before you proceed. Then check the bucket's settings and update the bucket's policy to ensure that Macie can add (put) objects to the bucket. This topic explains which setting to check and how to update the policy. It also provides examples of the statements to add to the policy.

The following sections provide step-by-step instructions for each option. Choose the section for the option that you want.

If you prefer to use a new S3 bucket that Macie creates for you, the final step in the process is to configure the repository settings in Macie.

To configure the repository settings in Macie

  1. Open the Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, under Settings, choose Discovery results.

  3. Under Repository for sensitive data discovery results, choose Create bucket.

  4. In the Create a bucket box, enter a name for the bucket. The name must be unique across all S3 buckets. In addition, the name can consist only of lowercase letters, numbers, dots (.), and hyphens (-). For additional naming requirements, see Bucket naming rules in the Amazon Simple Storage Service User Guide.

  5. Expand the Advanced section.

  6. (Optional) To specify a prefix to use in the path to a location in the bucket, enter the prefix in the Data discovery result prefix box.

    When you enter a value, Macie updates the example below the box to show the path to the bucket location where it will store your discovery results.

  7. For Block all public access, choose Yes to enable all block public access settings for the bucket. For information about these settings, see Blocking public access to your Amazon S3 storage in the Amazon Simple Storage Service User Guide.

  8. Under Encryption settings, specify the AWS KMS key that you want to use to encrypt the results:

    • To use a key from your own account, choose Select a key from your account. Then, in the AWS KMS key list, choose the key to use. The list displays customer managed, symmetric encryption KMS keys for your account.

    • To use a key that another account owns, choose Enter the ARN of a key from another account. Then, in the AWS KMS key ARN box, enter the Amazon Resource Name (ARN) of the key to use—for example, arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab.

  9. When you finish entering the settings, choose Save. Macie tests the settings to verify that they're correct. If any settings are incorrect, Macie displays an error message to help you address the issue.

After you save the repository settings, Macie adds existing discovery results for the preceding 90 days to the repository. Macie also starts adding new discovery results to the repository.

If you prefer to store your sensitive data discovery results in a particular S3 bucket that you create, create and configure the bucket before you configure the repository settings in Macie.

If you enabled Object Lock for the bucket, ensure that you disable the default retention setting for that feature. Otherwise, Macie won't be able to add your discovery results to the bucket. For information about this setting, see Using S3 Object Lock in the Amazon Simple Storage Service User Guide.

Then add a bucket policy that allows Macie to retrieve information about the bucket and add (put) objects to the bucket. You can then configure the repository settings in Macie.

To add the bucket policy to the bucket

  1. Open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. Choose the bucket that you want to store your discovery results in.

  3. Choose the Permissions tab.

  4. In the Bucket policy section, choose Edit.

  5. Copy the following example policy to your clipboard:

    { "Version": "2012-10-17", "Statement": [ { "Sid": "Allow Macie to use the GetBucketLocation operation", "Effect": "Allow", "Principal": { "Service": "macie.amazonaws.com" }, "Action": "s3:GetBucketLocation", "Resource": "arn:aws:s3:::myBucketName", "Condition": { "StringEquals": { "aws:SourceAccount": "111122223333" }, "ArnLike": { "aws:SourceArn": [ "arn:aws:macie2:Region:111122223333:export-configuration:*", "arn:aws:macie2:Region:111122223333:classification-job/*" ] } } }, { "Sid": "Allow Macie to add objects to the bucket", "Effect": "Allow", "Principal": { "Service": "macie.amazonaws.com" }, "Action": "s3:PutObject", "Resource": "arn:aws:s3:::myBucketName/[optional prefix/]*", "Condition": { "StringEquals": { "aws:SourceAccount": "111122223333" }, "ArnLike": { "aws:SourceArn": [ "arn:aws:macie2:Region:111122223333:export-configuration:*", "arn:aws:macie2:Region:111122223333:classification-job/*" ] } } }, { "Sid": "Deny unencrypted object uploads. This is optional", "Effect": "Deny", "Principal": { "Service": "macie.amazonaws.com" }, "Action": "s3:PutObject", "Resource": "arn:aws:s3:::myBucketName/[optional prefix/]*", "Condition": { "StringNotEquals": { "s3:x-amz-server-side-encryption": "aws:kms" } } }, { "Sid": "Deny incorrect encryption headers. This is optional", "Effect": "Deny", "Principal": { "Service": "macie.amazonaws.com" }, "Action": "s3:PutObject", "Resource": "arn:aws:s3:::myBucketName/[optional prefix/]*", "Condition": { "StringNotEquals": { "s3:x-amz-server-side-encryption-aws-kms-key-id": "arn:aws:kms:Region:111122223333:key/KMSKeyId" } } }, { "Sid": "Deny non-HTTPS access", "Effect": "Deny", "Principal": "*", "Action": "s3:*", "Resource": "arn:aws:s3:::myBucketName/*", "Condition": { "Bool": { "aws:SecureTransport": "false" } } } ] }
  6. Paste the example policy in the Bucket policy editor on the Amazon S3 console.

  7. Update the bucket policy with the correct values for your environment:

    • In the optional statement that denies incorrect encryption headers:

      • Replace myBucketName with the name of the bucket.

      • In the StringNotEquals condition, replace the placeholder value for the specified field with the Amazon Resource Name (ARN) of the AWS KMS key to use for encryption of your discovery results.

    • In all other statements, replace the placeholder values, where:

      • myBucketName is the name of the bucket.

      • Region is the AWS Region in which you're using Macie and want to allow Macie to add discovery results to the bucket.

        If you use Macie in multiple Regions and want to allow Macie to add results to the bucket for additional Regions, add aws:SourceArn conditions for each additional Region. For example:

        "aws:SourceArn": [ "arn:aws:macie2:us-east-1:111122223333:export-configuration:*", "arn:aws:macie2:us-east-1:111122223333:classification-job/*", "arn:aws:macie2:us-west-2:111122223333:export-configuration:*", "arn:aws:macie2:us-west-2:111122223333:classification-job/*" ]

        Alternatively, you can allow Macie to add results to the bucket for all Regions in which you use Macie. To do this, replace the placeholder value with the wildcard character (*). For example:

        "aws:SourceArn": [ "arn:aws:macie2:*:111122223333:export-configuration:*", "arn:aws:macie2:*:111122223333:classification-job/*" ]
      • 111122223333 is the account ID for your AWS account.

    • If you're using Macie in a manually enabled AWS Region, add the appropriate Region code to the value for the Service field in each statement that specifies the Macie service principal. For example, if you're using Macie in the Middle East (Bahrain) Region, which has the Region code me-south-1, replace macie.amazonaws.com with macie.me-south-1.amazonaws.com in each applicable statement.

    Note that the example policy includes statements that allow Macie to determine which Region the bucket resides in (GetBucketLocation) and to add objects to the bucket (PutObject). These statements define conditions that use two IAM global condition keys:

    • aws:SourceAccount – This condition allows Macie to add sensitive data discovery results to the bucket only for your account. It prevents Macie from adding discovery results for other accounts to the bucket. More specifically, the condition specifies which account can use the bucket for the resources and actions specified by the aws:SourceArn condition.

      To store results for additional accounts in the bucket, add the account ID for each additional account to this condition. For example:

      "aws:SourceAccount": [111122223333,444455556666]
    • aws:SourceArn – This condition restricts access to the bucket based on the source of the objects that are being added to the bucket. It prevents other AWS services from adding objects to the bucket. It also prevents Macie from adding objects to the bucket while performing other actions for your account. More specifically, the condition allows Macie to add objects to the bucket only if the objects are sensitive data discovery results, and only if those results are for sensitive data discovery jobs that are created by the account and in the Region specified in the condition.

      To allow Macie to perform the specified actions for additional accounts, add ARNs for each additional account to this condition. For example:

      "aws:SourceArn": [ "arn:aws:macie2:us-east-1:111122223333:export-configuration:*", "arn:aws:macie2:us-east-1:111122223333:classification-job/*", "arn:aws:macie2:us-east-1:444455556666:export-configuration:*", "arn:aws:macie2:us-east-1:444455556666:classification-job/*" ]

      The accounts specified by the aws:SourceAccount and aws:SourceArn conditions should match.

    Both conditions help prevent Macie from being used as a confused deputy during transactions with Amazon S3. Although we don’t recommend it, you can remove these conditions from the bucket policy.

  8. When you finish updating the bucket policy, choose Save changes.

Important

If you change the bucket path after you configure the repository settings in Macie, you have to update the bucket policy. Otherwise, Macie won't be allowed to add discovery results to the bucket.

You can now configure the repository settings in Macie.

To configure the repository settings in Macie

  1. Open the Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, under Settings, choose Discovery results.

  3. Under Repository for sensitive data discovery results, choose Existing bucket.

  4. For Choose a bucket, select the bucket that you want to store your discovery results in.

  5. (Optional) To specify a prefix to use in the path to a location in the bucket, expand the Advanced section. Then, for Data discovery result prefix, enter the prefix to use.

    When you enter a value, Macie updates the example below the box to show the path to the bucket location where it will store your discovery results.

  6. Under Encryption settings, specify the AWS KMS key that you want to use to encrypt the results:

    • To use a key from your own account, choose Select a key from your account. Then, in the AWS KMS key list, choose the key to use. The list displays customer managed, symmetric encryption KMS keys for your account.

    • To use a key that another account owns, choose Enter the ARN of a key from another account. Then, in the AWS KMS key ARN box, enter the ARN of the key to use—for example, arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab.

  7. When you finish entering the settings, choose Save. Macie tests the settings to verify that they're correct. If any settings are incorrect, Macie displays an error message to help you address the issue.

After you save the repository settings, Macie adds existing discovery results for the preceding 90 days to the repository. Macie also starts adding new discovery results to the repository.