Assessing automated sensitive data discovery coverage - Amazon Macie

Assessing automated sensitive data discovery coverage

As automated sensitive data discovery progresses for your account, Amazon Macie provides statistics and details to help you assess and monitor its coverage of your Amazon Simple Storage Service (Amazon S3) data estate. With this data, you can check the status of automated sensitive data discovery for your data estate overall and for individual S3 buckets in your bucket inventory. You can also identify issues that prevented Macie from analyzing objects in specific buckets. If you remediate the issues, you can increase coverage of your Amazon S3 data during subsequent analysis cycles.

Coverage data provides a snapshot of the current status of automated sensitive data discovery for your S3 general purpose buckets in the current AWS Region. If you're the Macie administrator for an organization, this includes buckets that your member accounts own. For each bucket, the data indicates whether issues occurred when Macie attempted to analyze objects in the bucket. If issues occurred, the data indicates the nature of each issue and, in certain cases, the number of occurrences. The data is updated as automated sensitive data discovery progresses for your account each day. If Macie analyzes or attempts to analyze one or more objects in a bucket during a daily analysis cycle, Macie updates coverage and other data to reflect the results.

For certain types of issues, you can review the data in aggregate for all of your S3 general purpose buckets and optionally drill down for additional details about each bucket. For example, coverage data can help you quickly identify all the buckets that Macie isn't allowed to access for your account. Coverage data also reports object-level issues that occurred. These issues, referred to as classification errors, prevented Macie from analyzing specific objects in a bucket. For example, you can determine how many objects Macie couldn't analyze in a bucket because the objects are encrypted with an AWS Key Management Service (AWS KMS) key that's no longer available.

If you use the Amazon Macie console to review coverage data, your view of the data includes guidance for remediating each type of issue. Subsequent topics in this section also provide remediation guidance for each type.

Reviewing automated sensitive data discovery coverage data

To review and assess automated sensitive data discovery coverage for your account, you can use the Amazon Macie console or the Amazon Macie API. Both the console and the API provide data that indicates the current status of the analyses for your Amazon Simple Storage Service (Amazon S3) general purpose buckets in the current AWS Region. The data includes information about issues that create gaps in the analyses:

  • Buckets that Macie isn't allowed to access. Macie can't analyze any objects in these buckets because the buckets' permissions settings prevent Macie from accessing the buckets and the buckets' objects.

  • Buckets that don't store any classifiable objects. Macie can't analyze any objects in these buckets because all the objects use Amazon S3 storage classes that Macie doesn't support, or they have file name extensions for file or storage formats that Macie doesn't support.

  • Buckets that Macie hasn’t been able to analyze yet due to object-level classification errors. Macie attempted to analyze one or more objects in these buckets. However, Macie couldn't analyze the objects due to issues with object-level permissions settings, object content, or quotas.

Coverage data is updated as automated sensitive data discovery progresses for your account each day. If you're the Macie administrator for an organization, the data includes information for S3 buckets that your member accounts own.

Note

Coverage data doesn't explicitly include results for sensitive data discovery jobs that you've created and run. However, remediating coverage issues that affect your automated sensitive data discovery results is likely to also increase coverage by sensitive data discovery jobs that you subsequently run. To assess coverage for a job, review the job's statistics and results. If a job's log events or other results indicate coverage issues, the remediation guidance later in this section can help you address some of the issues.

To review automated sensitive data discovery coverage data

You can use the Amazon Macie console or the Amazon Macie API to review coverage data for your account or organization. On the console, a single page provides a unified view of coverage data for all of your S3 general purpose buckets, including a rollup of issues that recently occurred for each bucket. The page also provides options for reviewing groups of data by issue type. To track your investigation of issues for specific buckets, you can export data from the page to a comma-separated values (CSV) file.

Console

Follow these steps to review automated sensitive data discovery coverage data by using the Amazon Macie console.

To review coverage data
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Resource coverage.

  3. On the Resource coverage page, choose the tab for the type of coverage data that you want to review:

    • All – Lists all the buckets that Macie monitors and analyzes for your account.

      For each bucket, the Issues field indicates whether issues prevented Macie from analyzing objects in the bucket. If the value for this field is None, Macie has analyzed at least one of the bucket's objects or Macie hasn't attempted to analyze any of the bucket's objects yet. If there are issues, this field indicates the nature of the issues and how to remediate the issues. For object-level classification errors, it might also indicate (in parentheses) the number of occurrences of the error.

    • Access denied – Lists buckets that Macie isn't allowed to access. The permissions settings for these buckets prevent Macie from accessing the buckets and the buckets' objects. Consequently, Macie can't analyze any objects in these buckets.

    • Classification error – Lists buckets that Macie hasn’t analyzed yet due to object-level classification errors—issues with object-level permissions settings, object content, or quotas.

      For each bucket, the Issues field indicates the nature of each type of error that occurred and prevented Macie from analyzing an object in the bucket. It also indicates how to remediate each type of error. Depending on the error, it might also indicate (in parentheses) the number of occurrences of the error.

    • Unclassifiable – Lists buckets that Macie can't analyze because they don't store any classifiable objects. All the objects in these buckets use unsupported Amazon S3 storage classes or have file name extensions for unsupported file or storage formats. Consequently, Macie can't analyze any objects in these buckets.

  4. To drill down and review the supporting data for a bucket, choose the bucket's name. Then refer to the bucket details panel for statistics and other information about the bucket.

  5. To export the table to a CSV file, choose Export to CSV at the top of the page. The resulting CSV file contains a subset of metadata for each bucket in the table, for up to 50,000 buckets. The file includes a Coverage issues field. The value for this field indicates whether issues prevented Macie from analyzing objects in the bucket and, if so, the nature of the issues.

API

To review coverage data programmatically, specify filter criteria in queries that you submit using the DescribeBuckets operation of the Amazon Macie API. This operation returns an array of objects. Each object contains statistical data and other information about an S3 general purpose bucket that matches the filter criteria.

In the filter criteria, include a condition for the type of coverage data that you want to review:

  • To identify buckets that Macie isn't allowed to access due to the buckets' permissions settings, include a condition where the value for the errorCode field equals ACCESS_DENIED.

  • To identify buckets that Macie is allowed to access and hasn't analyzed yet, include conditions where the value for the sensitivityScore field equals 50 and the value for the errorCode field doesn't equal ACCESS_DENIED.

  • To identify buckets that Macie can't analyze because all the buckets' objects use unsupported storage classes or formats, include conditions where the value for the classifiableSizeInBytes field equals 0 and the value for the sizeInBytes field is greater than 0.

  • To identify buckets for which Macie has analyzed at least one object, include conditions where the value for the sensitivityScore field falls within the range of 1–99 but is not equal to 50. To also include buckets where you manually assigned the maximum score, the range should be 1–100.

  • To identify buckets that Macie hasn’t analyzed yet due to object-level classification errors, include a condition where the value for the sensitivityScore field equals -1. To then review a breakdown of the types and number of errors that occurred for a particular bucket, use the GetResourceProfile operation.

If you're using the AWS Command Line Interface (AWS CLI), specify filter criteria in queries that you submit by running the describe-buckets command. To review a breakdown of the types and number of errors that occurred for a particular S3 bucket, if any, run the get-resource-profile command.

For example, the following AWS CLI commands use filter criteria to retrieve the details of all the S3 buckets that Macie isn't allowed to access due to the buckets' permissions settings.

This example is formatted for Linux, macOS, or Unix:

$ aws macie2 describe-buckets --criteria '{"errorCode":{"eq":["ACCESS_DENIED"]}}'

This example is formatted for Microsoft Windows:

C:\> aws macie2 describe-buckets --criteria={\"errorCode\":{\"eq\":[\"ACCESS_DENIED\"]}}

If your request succeeds, Macie returns a buckets array. The array contains an object for each S3 bucket that’s in the current AWS Region and matches the filter criteria.

If no S3 buckets match the filter criteria, Macie returns an empty buckets array.

{ "buckets": [] }

For more information about specifying filter criteria in queries, including examples of common criteria, see Filtering your S3 bucket inventory.

Remediating coverage issues for automated sensitive data discovery

Amazon Macie reports several types of issues that reduce automated sensitive data discovery coverage of your Amazon Simple Storage Service (Amazon S3) data. The following information can help you investigate and remediate these issues.

Tip

To investigate object-level classification errors for an S3 bucket, start by reviewing the list of object samples for the bucket. This list indicates which objects Macie analyzed or attempted to analyze in the bucket, for up to 100 objects.

To review the list on the Amazon Macie console, choose the bucket on the S3 buckets page, and then choose the Object samples tab in the bucket details panel. To review the list programmatically, use the ListResourceProfileArtifacts operation of the Amazon Macie API. If the status of the analysis for an object is Skipped (SKIPPED), the object might have caused the error.

Access denied

This issue indicates that an S3 bucket's permissions settings prevent Macie from accessing the bucket and the bucket’s objects. Macie can't retrieve and analyze any objects in the bucket.

Details

The most common cause for this type of issue is a restrictive bucket policy. A bucket policy is a resource-based AWS Identity and Access Management (IAM) policy that specifies which actions a principal (user, account, service, or other entity) can perform on an S3 bucket, and the conditions under which a principal can perform those actions. A restrictive bucket policy uses explicit Allow or Deny statements that grant or restrict access to a bucket's data based on specific conditions. For example, a bucket policy might contain an Allow or Deny statement that denies access to a bucket unless specific source IP addresses are used to access the bucket.

If the bucket policy for an S3 bucket contains an explicit Deny statement with one or more conditions, Macie might not be allowed to retrieve and analyze the bucket’s objects to detect sensitive data. Macie can only provide a subset of information about the bucket, such as the bucket's name and creation date.

Remediation guidance

To remediate this issue, update the bucket policy for the S3 bucket. Ensure that the policy allows Macie to access the bucket and the bucket’s objects. To allow this access, add a condition for the Macie service-linked role (AWSServiceRoleForAmazonMacie) to the policy. The condition should exclude the Macie service-linked role from matching the Deny restriction in the policy. It can do this by using the aws:PrincipalArn global condition context key and the Amazon Resource Name (ARN) of the Macie service-linked role for your account.

If you update the bucket policy and Macie gains access to the S3 bucket, Macie will detect the change. When this happens, Macie will update statistics, inventory data, and other information that it provides about your Amazon S3 data. In addition, the bucket's objects will be a higher priority for analysis during a subsequent analysis cycle.

Additional reference

For more information about updating an S3 bucket policy to allow Macie to access a bucket, see Allowing Amazon Macie to access S3 buckets and objects. For information about using bucket policies to control access to buckets, see Bucket policies and user policies and How Amazon S3 authorizes a request in the Amazon Simple Storage Service User Guide.

Classification error: Invalid content

This type of classification error occurs if Macie attempts to analyze an object in an S3 bucket and the object is malformed or the object contains content that exceeds a sensitive data discovery quota. Macie can't analyze the object.

Details

This error typically occurs because an S3 object is a malformed or corrupted file. Consequently, Macie can't parse and analyze all the data in the file.

This error can also occur if analysis of an S3 object would exceed a sensitive data discovery quota for an individual file. For example, the storage size of the object exceeds the size quota for that type of file.

For either case, Macie can't complete its analysis of the S3 object and the status of the analysis for the object is Skipped (SKIPPED).

Remediation guidance

To investigate this error, download the S3 object and check the formatting and contents of the file. Also assess the contents of the file against Macie quotas for sensitive data discovery.

If you don't remediate this error, Macie will try to analyze other objects in the S3 bucket. If Macie analyzes another object successfully, Macie will update coverage data and other information that it provides about the bucket.

Additional reference

For a list of sensitive data discovery quotas, including the quotas for certain types of files, see Amazon Macie quotas. For information about how Macie updates sensitivity scores and other information that it provides about S3 buckets, see How automated sensitive data discovery works.

Classification error: Invalid encryption

This type of classification error occurs if Macie attempts to analyze an object in an S3 bucket and the object is encrypted with a customer-provided key. The object uses SSE-C encryption, which means that Macie can't retrieve and analyze the object.

Details

Amazon S3 supports multiple encryption options for S3 objects. For most of these options, Macie can decrypt an object by using the Macie service-linked role for your account. However, this depends on the type of encryption that was used.

For Macie to decrypt an S3 object, the object must be encrypted with a key that Macie can access and is allowed to use. If an object is encrypted with a customer-provided key, Macie can't provide the requisite key material to retrieve the object from Amazon S3. Consequently, Macie can't analyze the object and the status of the analysis for the object is Skipped (SKIPPED).

Remediation guidance

To remediate this error, encrypt S3 objects with Amazon S3 managed keys or AWS Key Management Service (AWS KMS) keys. If you prefer to use AWS KMS keys, the keys can be AWS managed KMS keys, or customer managed KMS keys that Macie is allowed to use.

To encrypt existing S3 objects with keys that Macie can access and use, you can change the encryption settings for the objects. To encrypt new objects with keys that Macie can access and use, change the default encryption settings for the S3 bucket. Also ensure that the bucket's policy doesn't require new objects to be encrypted with a customer-provided key.

If you don't remediate this error, Macie will try to analyze other objects in the S3 bucket. If Macie analyzes another object successfully, Macie will update coverage data and other information that it provides about the bucket.

Additional reference

For information about requirements and options for using Macie to analyze encrypted S3 objects, see Analyzing encrypted Amazon S3 objects with Amazon Macie. For information about encryption options and settings for S3 buckets, see Protecting data with encryption and Setting default server-side encryption behavior for S3 buckets in the Amazon Simple Storage Service User Guide.

Classification error: Invalid KMS key

This type of classification error occurs if Macie attempts to analyze an object in an S3 bucket and the object is encrypted with an AWS Key Management Service (AWS KMS) key that's no longer available. Macie can't retrieve and analyze the object.

Details

AWS KMS provides options for disabling and deleting customer managed AWS KMS keys. If an S3 object is encrypted with a KMS key that is disabled, is scheduled for deletion, or was deleted, Macie can't retrieve and decrypt the object. Consequently, Macie can't analyze the object and the status of the analysis for the object is Skipped (SKIPPED). For Macie to analyze an encrypted object, the object must be encrypted with a key that Macie can access and is allowed to use.

Remediation guidance

To remediate this error, re-enable or cancel the scheduled deletion of the applicable AWS KMS key, depending on the current status of the key. If the applicable key was already deleted, this error cannot be remediated.

To determine which AWS KMS key was used to encrypt an S3 object, you can start by using Macie to review the server-side encryption settings for the S3 bucket. If the default encryption settings for the bucket are configured to use a KMS key, the bucket's details indicate which key is used. You can then check the status of that key. Alternatively, you can use Amazon S3 to review the encryption settings for the bucket and individual objects in the bucket.

If you don't remediate this error, Macie will try to analyze other objects in the S3 bucket. If Macie analyzes another object successfully, Macie will update coverage data and other information that it provides about the bucket.

Additional reference

For information about using Macie to review the server-side encryption settings for an S3 bucket, see Reviewing the details of S3 buckets. For information about re-enabling or canceling the scheduled deletion of an AWS KMS key, see Enabling and disabling keys and Scheduling and canceling key deletion in the AWS Key Management Service Developer Guide.

Classification error: Permission denied

This type of classification error occurs if Macie attempts to analyze an object in an S3 bucket and Macie can't retrieve or decrypt the object due to the permissions settings for the object or the permissions settings for the key that was used to encrypt the object. Macie can't retrieve and analyze the object.

Details

This error typically occurs because an S3 object is encrypted with a customer managed AWS Key Management Service (AWS KMS) key that Macie isn’t allowed to use. If an object is encrypted with a customer managed AWS KMS key, the key's policy must allow Macie to decrypt data by using the key.

This error can also occur if Amazon S3 permissions settings prevent Macie from retrieving an S3 object. The bucket policy for the S3 bucket might restrict access to specific bucket objects or allow only certain principals (users, accounts, services, or other entities) to access the objects. Or the access control list (ACL) for an object might restrict access to the object. Consequently, Macie might not be allowed to access the object.

For any of the preceding cases, Macie can't retrieve and analyze the object, and the status of the analysis for the object is Skipped (SKIPPED).

Remediation guidance

To remediate this error, determine whether the S3 object is encrypted with a customer managed AWS KMS key. If it is, ensure that the key's policy allows the Macie service-linked role (AWSServiceRoleForAmazonMacie) to decrypt data with the key. How you allow this access depends on whether the account that owns the AWS KMS key also owns the S3 bucket that stores the object. If the same account owns the KMS key and the bucket, a user of the account has to update the key's policy. If one account owns the KMS key and a different account owns the bucket, a user of the account that owns the key has to allow cross-account access to the key.

Tip

You can automatically generate a list of all the customer managed AWS KMS keys that Macie needs to access to analyze objects in the S3 buckets for your account. To do this, run the AWS KMS Permission Analyzer script, which is available from the Amazon Macie Scripts repository on GitHub. The script can also generate an additional script of AWS Command Line Interface (AWS CLI) commands. You can optionally run those commands to update the requisite configuration settings and policies for KMS keys that you specify.

If Macie is already allowed to use the applicable AWS KMS key or the S3 object isn't encrypted with a customer managed KMS key, ensure that the bucket's policy allows Macie to access the object. Also verify that the object's ACL allows Macie to read the object's data and metadata.

For the bucket policy, you can allow this access by adding a condition for the Macie service-linked role to the policy. The condition should exclude the Macie service-linked role from matching the Deny restriction in the policy. It can do this by using the aws:PrincipalArn global condition context key and the Amazon Resource Name (ARN) of the Macie service-linked role for your account.

For the object ACL, you can allow this access by working with the object owner to add your AWS account as a grantee with READ permissions for the object. Macie can then use the service-linked role for your account to retrieve and analyze the object. Also consider changing the Object Ownership settings for the bucket. You can use these settings to disable ACLs for all the objects in the bucket and grant ownership permissions to the account that owns the bucket.

If you don't remediate this error, Macie will try to analyze other objects in the S3 bucket. If Macie analyzes another object successfully, Macie will update coverage data and other information that it provides about the bucket.

Additional reference

For more information about allowing Macie to decrypt data with a customer managed AWS KMS key, see Allowing Amazon Macie to use a customer managed AWS KMS key. For information about updating an S3 bucket policy to allow Macie to access a bucket, see Allowing Amazon Macie to access S3 buckets and objects.

For information about updating a key policy, see Changing a key policy in the AWS Key Management Service Developer Guide. For information about using customer managed AWS KMS keys to encrypt S3 objects, see Using server-side encryption with AWS KMS keys in the Amazon Simple Storage Service User Guide.

For information about using bucket policies to control access to S3 buckets, see Bucket policies and user policies and How Amazon S3 authorizes a request in the Amazon Simple Storage Service User Guide. For information about using ACLs or Object Ownership settings to control access to S3 objects, see Managing access with ACLs and Controlling ownership of objects and disabling ACLs for your bucket in the Amazon Simple Storage Service User Guide.

Unclassifiable

This issue indicates that all the objects in an S3 bucket are stored using unsupported Amazon S3 storage classes or unsupported file or storage formats. Macie can't analyze any objects in the bucket.

Details

To be eligible for selection and analysis, an S3 object must use an Amazon S3 storage class that Macie supports. The object must also have a file name extension for a file or storage format that Macie supports. If an object doesn't meet these criteria, the object is treated as an unclassifiable object. Macie doesn't attempt to retrieve or analyze data in unclassifiable objects.

If all the objects in an S3 bucket are unclassifiable objects, the overall bucket is an unclassifiable bucket. Macie can't perform automated sensitive data discovery for the bucket.

Remediation guidance

To address this issue, review lifecycle configuration rules and other settings that determine which storage classes are used to store objects in the S3 bucket. Consider adjusting those settings to use storage classes that Macie supports. You can also change the storage class of existing objects in the bucket.

Also assess the file and storage formats of existing objects in the S3 bucket. To analyze the objects, consider porting the data, either temporarily or permanently, to new objects that use a supported format.

If objects are added to the S3 bucket and they use a supported storage class and format, Macie will detect the objects the next time it evaluates your bucket inventory. When this happens, Macie will stop reporting that the bucket is unclassifiable in statistics, coverage data, and other information that it provides about your Amazon S3 data. In addition, the new objects will be a higher priority for analysis during a subsequent analysis cycle.

Additional reference

For information about the Amazon S3 storage classes and the file and storage formats that Macie supports, see Storage classes and formats supported by Amazon Macie. For information about lifecycle configuration rules and the storage class options that Amazon S3 provides, see Managing your storage lifecycle and Using Amazon S3 storage classes in the Amazon Simple Storage Service User Guide.