Locating sensitive data with Amazon Macie findings - Amazon Macie

Locating sensitive data with Amazon Macie findings

When you run a sensitive data discovery job, Amazon Macie performs a deep inspection of the latest version of each Amazon Simple Storage Service (Amazon S3) object that you configure the job to analyze. Macie also uses a depth-first search algorithm to populate the job's findings with details about the location of specific occurrences of sensitive data that Macie finds. These occurrences provide insight into the categories and types of sensitive data that the affected S3 bucket and object contain. The details can help you determine whether to perform a deeper investigation of specific buckets and objects, and locate individual occurrences of sensitive data in S3 objects.

With sensitive data findings, you can determine the location of as many as 15 occurrences of sensitive data that Macie finds in an affected S3 object. This includes sensitive data that Macie detects using managed data identifiers, and data that matches the criteria of custom data identifiers that you configure a job to use.

A sensitive data finding can provide details such as:

  • The column and row number for a cell or field in a Microsoft Excel workbook, CSV file, or TSV file.

  • The path to a field or array in a JSON or JSON Lines file.

  • The line number for a line in a non-binary text file other than a CSV, JSON, JSON Lines, or TSV file—for example, an HTML, TXT, or XML file.

  • The page number for a page in an Adobe Portable Document Format (PDF) file.

  • The record index and the path to a field in a record in an Apache Avro object container or Apache Parquet file.

You can access these details by using the Amazon Macie console or the Amazon Macie API. You can also access these details in findings that Macie publishes to other AWS services, both Amazon EventBridge and AWS Security Hub. To learn how to access the details in findings that Macie publishes to other AWS services, see Monitoring and processing findings. To learn about the JSON structures that Macie uses to report the location of sensitive data, see JSON schema for sensitive data locations.

If an S3 object contains many occurrences of sensitive data, you can also use a finding to navigate to its corresponding sensitive data discovery result. Unlike a sensitive data finding, a sensitive data discovery result provides detailed location data for as many as 1,000 occurrences of each type of sensitive data that Macie finds in an object. If an S3 object is an archive file, such as a .tar or .zip file, this includes occurrences of sensitive data in individual files that Macie extracts from the archive. (Macie doesn’t include this information in sensitive data findings.) For more information about sensitive data discovery results, see Reviewing job statistics and results. Macie uses the same JSON schema for location data in sensitive data findings and sensitive data discovery results.

Locating occurrences of sensitive data

To locate occurrences of sensitive data, you can use the Amazon Macie console or the Amazon Macie API. The following steps explain how to locate sensitive data by using the console.

To locate sensitive data programmatically, use the GetFindings operation of the Amazon Macie API. If a finding includes details about the location of one or more occurrences of a specific type of sensitive data, occurrences objects in the finding provide these details. For more information, see JSON schema for sensitive data locations.

To locate occurrences of sensitive data

  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Findings.

    Tip

    You can use the Jobs page to display all the findings from a particular job. To do this, choose Jobs in the navigation pane, and then choose the name of the job. At the top of the details panel, choose Show results, and then choose Show findings.

  3. On the Findings page, choose the finding for the sensitive data that you want to locate. The details panel displays information for the finding.

  4. In the details panel, scroll to the Sensitive data section. This section provides information about the categories and types of sensitive data that Macie found in the affected S3 object. It also indicates the number of occurrences of each type of sensitive data that Macie found.

    For example, the following image shows some details of a finding that reports 30 occurrences of credit card numbers, 30 occurrences of names, and 30 occurrences of US Social Security numbers.

    
						The finding details panel with three fields. Each field shows the
							number of occurrences of a specific type of sensitive data and each
							number is formatted as a link.

    If the finding includes details about the location of one or more occurrences of a specific type of sensitive data, the number of occurrences is a link. Choose the link to show the details. Macie opens a new window and displays the details in JSON format.

    For example, the following image shows the location of two occurrences of credit card numbers in an affected object.

    
						A window that displays row and column details, in JSON format, for
							two occurrences of credit card numbers in an affected object.

    To save the details as a JSON file, choose Download, and then and specify a name and location for the file.

  5. (Optional) To save all the finding's details as a JSON file, choose the finding's identifier (Finding ID) at the top of the details panel. Macie opens a new window and displays all the details in JSON format. Choose Download, and then specify a name and location for the file.

To access details about the location of as many as 1,000 occurrences of each type of sensitive data in the affected object, refer to the corresponding sensitive data discovery result for the finding. To do this, scroll to the beginning of the Details section of the panel, and then choose the link in the Detailed result location field. Macie opens the Amazon S3 console and displays the file or folder that contains the discovery result. To learn more about these results, see Reviewing job statistics and results.