Monitoring sensitive data discovery jobs - Amazon Macie

Monitoring sensitive data discovery jobs

In addition to monitoring the overall status of a sensitive data discovery job, you can monitor and analyze specific events that occur as a job progresses. You can do this by using near real-time logging data that Amazon Macie automatically publishes to Amazon CloudWatch Logs. The data in these logs provides a record of changes to a job's progress or status, such as the exact date and time when a job started to run, was paused, or finished running.

The log data also provides details about any account- or bucket-level errors that occur while a job runs. For example, if the permissions settings for an S3 bucket prevent a job from analyzing objects in the bucket, Macie logs an event. The event indicates when the error occurred, and it identifies both the affected bucket and the account that owns the bucket. The data for these types of events can help you identify, investigate, and address errors that prevent Macie from analyzing the data that you want.

With Amazon CloudWatch Logs, you can monitor, store, and access log files from multiple systems, applications, and AWS services, including Macie. You can also query and analyze log data, and configure CloudWatch Logs to notify you when certain events occur or thresholds are met. CloudWatch Logs also provides features for archiving log data and exporting the data to Amazon S3. To learn more about CloudWatch Logs, see the Amazon CloudWatch Logs User Guide.

How logging works for sensitive data discovery jobs

When you start running sensitive data discovery jobs, Amazon Macie automatically creates and configures the appropriate resources in Amazon CloudWatch Logs to log events for all of your jobs in the current AWS Region. Macie then publishes event data to those resources automatically when your jobs run. The permissions policy for the Macie service-linked role for your account allows Macie to perform these tasks on your behalf. You don't need to take any steps to create or configure resources in CloudWatch Logs or log event data for your jobs.

In CloudWatch Logs, logs are organized into log groups. Each log group contains log streams. Each log stream contains log events. The general purpose of each of these resources is as follows:

  • A log group is a collection of log streams that share the same retention, monitoring, and access control settings—for example, the collection of logs for all of your sensitive data discovery jobs.

  • A log stream is a sequence of log events that share the same source—for example, an individual sensitive data discovery job.

  • A log event is a record of an activity that was recorded by an application or resource—for example, an individual event that Macie recorded and published for a particular sensitive data discovery job.

Macie publishes events for all of your sensitive data discovery jobs to one log group, and each job has a unique log stream in that log group. The log group has the following prefix and name:

/aws/macie/classificationjobs

If this log group already exists, Macie uses it to store log events for your jobs. This can be helpful if your organization uses automated configuration, such as AWS CloudFormation, to create log groups with predefined log retention periods, encryption settings, tags, metric filters, and so on for job events.

If this log group doesn't exist, Macie creates it with the default settings that CloudWatch Logs uses for new log groups. The settings include a log retention period of Never Expire, which means that CloudWatch Logs stores the logs indefinitely. To change the retention period for the log group, you can use the Amazon CloudWatch console or the Amazon CloudWatch Logs API. To learn how, see Working with log groups and log streams in the Amazon CloudWatch Logs User Guide.

Within this log group, Macie creates a unique log stream for each job that you run, the first time that the job runs. The name of the log stream is the unique identifier for the job, such as 85a55dc0fa6ed0be5939d0408example, in the following format.

/aws/macie/classificationjobs/85a55dc0fa6ed0be5939d0408example

Each log stream contains all the log events that Macie recorded and published for the corresponding job. For periodic jobs, this includes events for all of the job's runs. If you delete the log stream for a periodic job, Macie creates the stream again the next time that the job runs. If you delete the log stream for a one-time job, you can't restore it.

Note that logging is enabled by default for all of your jobs. You can't disable it or otherwise prevent Macie from publishing job events to CloudWatch Logs. If you don't want to store the logs, you can reduce the retention period for the log group to as little as one day. At the end of the retention period, CloudWatch Logs automatically deletes expired event data from the log group.

Reviewing logs for sensitive data discovery jobs

You can review the logs for your sensitive data discovery jobs by using the Amazon CloudWatch console or the Amazon CloudWatch Logs API. Both the console and the API provide features that are designed to help you review and analyze log data. You can use these features to work with log streams and events for your jobs as you would work with any other type of log data in CloudWatch Logs.

For example, you can search and filter aggregate data to identify specific types of events that occurred for all of your jobs during a specific time range. Or you can perform a targeted review of all the events that occurred for a particular job. CloudWatch Logs also provides options for monitoring log data, defining metric filters, and creating custom alarms.

Tip

To navigate to the log events for a particular job by using the Amazon Macie console, do the following: On the Jobs page, choose the name of the job. At the top of the details panel, choose Show results, and then choose Show CloudWatch logs. Macie opens the Amazon CloudWatch console and displays a table of log events for the job.

To review logs for sensitive data discovery jobs
  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

  2. By using the AWS Region selector in the upper-right corner of the page, select the Region in which you ran jobs that you want to review logs for.

  3. In the navigation pane, choose Logs, and then choose Log groups.

  4. On the Log groups page, choose the /aws/macie/classificationjobs log group. CloudWatch Logs displays a table of log streams for the jobs that you've run. There is one unique stream for each job. The name of each stream correlates to the unique identifier for a job.

  5. Under Log streams, do one of the following:

    • To review the log events for a particular job, choose the log stream for the job. To find the stream more easily, enter the job's unique identifier in the filter box above the table. After you choose the log stream, CloudWatch Logs displays a table of log events for the job.

    • To review log events for all of your jobs, choose Search all log streams. CloudWatch Logs displays a table of log events for all of your jobs.

  6. (Optional) In the filter box above the table, enter terms, phrases, or values that specify characteristics of specific events to review. For more information, see Search log data using filter patterns in the Amazon CloudWatch Logs User Guide.

  7. To review the details of a specific log event, choose expand ( The expand row icon, which is a right-facing solid arrow. ) in the row for the event. CloudWatch Logs displays the event's details in JSON format.

As you familiarize yourself with the data in the log events, you can perform additional tasks to streamline analysis and monitoring of the data. For example, you can create metrics filters that turn log data into numerical CloudWatch metrics. You can also create custom alarms that make it easier to identify and respond to specific log events. For more information, see the Amazon CloudWatch Logs User Guide.

Log event schema for sensitive data discovery jobs

Each log event for a sensitive data discovery job is a JSON object that conforms to the Amazon CloudWatch Logs event schema and contains a standard set of fields. Some types of events have additional fields that provide information that's particularly useful for that type of event. For example, events for account-level errors include the account ID for the affected AWS account. Events for bucket-level errors include the name of the affected S3 bucket. For a detailed list of job events that Amazon Macie publishes to CloudWatch Logs, see Types of log events for jobs.

The following example shows the log event schema for sensitive data discovery jobs. In this example, the event reports that Macie wasn't able to analyze any objects in an S3 bucket because Amazon S3 denied access to the bucket.

{ "adminAccountId": "123456789012", "jobId": "85a55dc0fa6ed0be5939d0408example", "eventType": "BUCKET_ACCESS_DENIED", "occurredAt": "2021-04-14T17:11:30.574809Z", "description": "Macie doesn’t have permission to access the affected S3 bucket.", "jobName": "My_Macie_Job", "operation": "ListObjectsV2", "runDate": "2021-04-14T17:08:30.345809Z", "affectedAccount": "111122223333", "affectedResource": { "type": "S3_BUCKET_NAME", "value": "amzn-s3-demo-bucket" } }

In the preceding example, Macie attempted to list the objects in the bucket by using the ListObjectsV2 operation of the Amazon S3 API. When Macie sent the request to Amazon S3, Amazon S3 denied access to the bucket.

The following fields are common to all log events for sensitive data discovery jobs:

  • adminAccountId – The unique identifier for the AWS account that created the job.

  • jobId – The unique identifier for the job.

  • eventType – The type of event that occurred. For complete lists of possible values and a description of each one, see Types of log events for jobs.

  • occurredAt – The date and time, in Coordinated Universal Time (UTC) and extended ISO 8601 format, when the event occurred.

  • description – A brief description of the event.

  • jobName – The custom name of the job.

Depending on the type and nature of an event, a log event can also contain the following fields:

  • affectedAccount – The unique identifier for the AWS account that owns the affected resource.

  • affectedResource – An object that provides details about the affected resource. In the object, the type field specifies a field that stores metadata about a resource. The value field specifies the value for the field (type).

  • operation – The operation that Macie attempted to perform and caused the error.

  • runDate – The date and time, in Coordinated Universal Time (UTC) and extended ISO 8601 format, when the applicable job or job run started.

Types of log events for sensitive data discovery jobs

Amazon Macie publishes log events for three categories of events:

  • Job status events, which record changes to the status or progress of a job or a job run.

  • Account-level error events, which record errors that prevented Macie from analyzing Amazon S3 data for a specific AWS account.

  • Bucket-level error events, which record errors that prevented Macie from analyzing data in a specific S3 bucket.

The topics in this section list and describe the types of events that Macie publishes for each category.

Job status events

A job status event records a change to the status or progress of a job or a job run. For periodic jobs, Macie logs and publishes these events for both the overall job and individual job runs. For information about determining the overall status of a job, see Checking the status of sensitive data discovery jobs.

The following example uses sample data to show the structure and nature of the fields in a job status event. In this example, a SCHEDULED_RUN_COMPLETED event indicates that a scheduled run of a periodic job finished running. The run started on April 14, 2021, at 17:09:30 UTC, as indicated by the runDate field. The run finished on April 14, 2021, at 17:16:30 UTC, as indicated by the occurredAt field.

{ "adminAccountId": "123456789012", "jobId": "ffad0e71455f38a4c7c220f3cexample", "eventType": "SCHEDULED_RUN_COMPLETED", "occurredAt": "2021-04-14T17:16:30.574809Z", "description": "The scheduled job run finished running.", "jobName": "My_Daily_Macie_Job", "runDate": "2021-04-14T17:09:30.574809Z" }

The following table lists and describes the types of job status events that Macie logs and publishes to CloudWatch Logs. The Event type column indicates the name of each event as it appears in the eventType field of an event. The Description column provides a brief description of the event as it appears in the description field of an event. The Additional information provides information about the type of job that the event applies to. The table is sorted first by the general chronological order in which events might occur, and then in ascending alphabetical order by event type.

Event type Description Additional information

JOB_CREATED

The job was created.

Applies to one-time and periodic jobs.

ONE_TIME_JOB_STARTED

The job started running.

Applies only to one-time jobs.

SCHEDULED_RUN_STARTED

The scheduled job run started running.

Applies only to periodic jobs. To log the start of a one-time job, Macie publishes a ONE_TIME_JOB_STARTED event, not this type of event.

BUCKET_MATCHED_THE_CRITERIA

The affected bucket matched the bucket criteria specified for the job.

Applies to one-time and periodic jobs that use runtime bucket criteria to determine which S3 buckets to analyze.

The affectedResource object specifies the name of the bucket that matched the criteria and was included in the job's analysis.

NO_BUCKETS_MATCHED_THE_CRITERIA

The job started running but no buckets currently match the bucket criteria specified for the job. The job didn't analyze any data.

Applies to one-time and periodic jobs that use runtime bucket criteria to determine which S3 buckets to analyze.

SCHEDULED_RUN_COMPLETED

The scheduled job run finished running.

Applies only to periodic jobs. To log completion of a one-time job, Macie publishes a JOB_COMPLETED event, not this type of event.

JOB_PAUSED_BY_USER

The job was paused by a user.

Applies to one-time and periodic jobs that you stopped temporarily (paused).

JOB_RESUMED_BY_USER

The job was resumed by a user.

Applies to one-time and periodic jobs that you stopped temporarily (paused) and subsequently resumed.

JOB_PAUSED_BY_MACIE_SERVICE_QUOTA_MET

The job was paused by Macie. Completion of the job would exceed a monthly quota for the affected account.

Applies to one-time and periodic jobs that Macie stopped temporarily (paused).

Macie automatically pauses a job when additional processing by the job or a job run would exceed the monthly sensitive data discovery quota for one or more accounts that the job analyzes data for. To avoid this issue, consider increasing the quota for the affected accounts.

JOB_RESUMED_BY_MACIE_SERVICE_QUOTA_LIFTED

The job was resumed by Macie. The monthly service quota was lifted for the affected account.

Applies to one-time and periodic jobs that Macie stopped temporarily (paused) and subsequently resumed.

If Macie automatically paused a one-time job, Macie automatically resumes the job when the subsequent month starts or the monthly sensitive data discovery quota is increased for all the affected accounts, whichever occurs first. If Macie automatically paused a periodic job, Macie automatically resumes the job when the next run is scheduled to start or the subsequent month starts, whichever occurs first.

JOB_CANCELLED

The job was cancelled.

Applies to one-time and periodic jobs that you stopped permanently (cancelled) or, for one-time jobs, paused and didn't resume within 30 days.

If you suspend or disable Macie, this type of event also applies to jobs that were active or paused when you suspended or disabled Macie. Macie automatically cancels your jobs in an AWS Region if you suspend or disable Macie in the Region.

JOB_COMPLETED

The job finished running.

Applies only to one-time jobs. To log completion of a job run for a periodic job, Macie publishes a SCHEDULED_RUN_COMPLETED event, not this type of event.

Account-level error events

An account-level error event records an error that prevented Macie from analyzing objects in S3 buckets that are owned by a specific AWS account. The affectedAccount field in each event specifies the account ID for that account.

The following example uses sample data to show the structure and nature of the fields in an account-level error event. In this example, an ACCOUNT_ACCESS_DENIED event indicates that Macie wasn't able to analyze objects in any S3 buckets that are owned by account 444455556666.

{ "adminAccountId": "123456789012", "jobId": "85a55dc0fa6ed0be5939d0408example", "eventType": "ACCOUNT_ACCESS_DENIED", "occurredAt": "2021-04-14T17:08:30.585709Z", "description": "Macie doesn’t have permission to access S3 bucket data for the affected account.", "jobName": "My_Macie_Job", "operation": "ListBuckets", "runDate": "2021-04-14T17:05:27.574809Z", "affectedAccount": "444455556666" }

The following table lists and describes the types of account-level error events that Macie logs and publishes to CloudWatch Logs. The Event type column indicates the name of each event as it appears in the eventType field of an event. The Description column provides a brief description of the event as it appears in the description field of an event. The Additional information column provides any applicable tips for investigating or addressing the error that occurred. The table is sorted in ascending alphabetical order by event type.

Event type Description Additional information

ACCOUNT_ACCESS_DENIED

Macie doesn’t have permission to access S3 bucket data for the affected account.

This typically occurs because the buckets that are owned by the account have restrictive bucket policies. For information about how to address this issue, see Allowing Macie to access S3 buckets and objects.

The value for the operation field in the event can help you determine which permissions settings prevented Macie from accessing S3 data for the account. This field indicates the Amazon S3 operation that Macie attempted to perform when the error occurred.

ACCOUNT_DISABLED

The job skipped resources that are owned by the affected account. Macie was disabled for the account.

To address this issue, re-enable Macie for the account in the same AWS Region.

ACCOUNT_DISASSOCIATED

The job skipped resources that are owned by the affected account. The account isn't associated with your Macie administrator account as a member account anymore.

This occurs if you, as a Macie administrator for an organization, configure a job to analyze data for an associated member account and the member account is subsequently removed from your organization.

To address this issue, re-associate the affected account with your Macie administrator account as a member account. For more information, see Managing multiple accounts.

ACCOUNT_ISOLATED

The job skipped resources that are owned by the affected account. The AWS account was isolated.

ACCOUNT_REGION_DISABLED

The job skipped resources that are owned by the affected account. The AWS account isn't active in the current AWS Region.

ACCOUNT_SUSPENDED

The job was cancelled or skipped resources that are owned by the affected account. Macie was suspended for the account.

If the specified account is your own account, Macie automatically cancelled the job when you suspended Macie in the same Region. To address the issue, re-enable Macie in the Region.

If the specified account is a member account, re-enable Macie for that account in the same Region.

ACCOUNT_TERMINATED

The job skipped resources that are owned by the affected account. The AWS account was terminated.

Bucket-level error events

A bucket-level error event records an error that prevented Macie from analyzing objects in a specific S3 bucket. The affectedAccount field in each event specifies the account ID for the AWS account that owns the bucket. The affectedResource object in each event specifies the name of the bucket.

The following example uses sample data to show the structure and nature of the fields in a bucket-level error event. In this example, a BUCKET_ACCESS_DENIED event indicates that Macie wasn't able to analyze any objects in the S3 bucket named amzn-s3-demo-bucket. When Macie attempted to list the objects in the bucket by using the ListObjectsV2 operation of the Amazon S3 API, Amazon S3 denied access to the bucket.

{ "adminAccountId": "123456789012", "jobId": "85a55dc0fa6ed0be5939d0408example", "eventType": "BUCKET_ACCESS_DENIED", "occurredAt": "2021-04-14T17:11:30.574809Z", "description": "Macie doesn’t have permission to access the affected S3 bucket.", "jobName": "My_Macie_Job", "operation": "ListObjectsV2", "runDate": "2021-04-14T17:09:30.685209Z", "affectedAccount": "111122223333", "affectedResource": { "type": "S3_BUCKET_NAME", "value": "amzn-s3-demo-bucket" } }

The following table lists and describes the types of bucket-level error events that Macie logs and publishes to CloudWatch Logs. The Event type column indicates the name of each event as it appears in the eventType field of an event. The Description column provides a brief description of the event as it appears in the description field of an event. The Additional information column provides any applicable tips for investigating or addressing the error that occurred. The table is sorted in ascending alphabetical order by event type.

Event type Description Additional information

BUCKET_ACCESS_DENIED

Macie doesn’t have permission to access the affected S3 bucket.

This typically occurs because a bucket has a restrictive bucket policy. For information about how to address this issue, see Allowing Macie to access S3 buckets and objects.

The value for the operation field in the event can help you determine which permissions settings prevented Macie from accessing the bucket. This field indicates the Amazon S3 operation that Macie attempted to perform when the error occurred.

BUCKET_DETAILS_UNAVAILABLE

A temporary issue prevented Macie from retrieving details about the bucket and the bucket’s objects.

This occurs if a transient issue prevented Macie from retrieving the bucket and object metadata that it needs to analyze a bucket's objects. For example, an Amazon S3 exception occurred when Macie tried to verify that it's allowed to access the bucket.

To address the issue for a one-time job, consider creating and running a new, one-time job to analyze objects in the bucket. For a scheduled job, Macie will try to retrieve the metadata again during the next job run.

BUCKET_DOES_NOT_EXIST

The affected S3 bucket doesn’t exist anymore.

This typically occurs because a bucket was deleted.

BUCKET_IN_DIFFERENT_REGION

The affected S3 bucket was moved to a different AWS Region.

BUCKET_OWNER_CHANGED

The owner of the affected S3 bucket changed. Macie doesn’t have permission to access the bucket anymore.

This typically occurs if ownership of a bucket was transferred to an AWS account that isn't part of your organization. The affectedAccount field in the event indicates the account ID for the account that previously owned the bucket.