Managing sensitive data discovery jobs - Amazon Macie

Managing sensitive data discovery jobs

To help you manage your sensitive data discovery jobs, Amazon Macie provides a complete inventory of your jobs in each AWS Region. With this inventory, you can manage your jobs as a single collection, and access the configuration settings, status, and processing statistics for individual jobs. You can also access the sensitive data findings and other results that each job produced.

In addition to these tasks, you can create custom variations of individual jobs—copy an existing job, adjust the settings for the copy, and then save the copy as a new job. This can be helpful for cases where you want to analyze different sets of data in the same way, or the same set of data in different ways. Or you want to adjust the configuration settings for an existing job—cancel the existing job, copy it, and then adjust and save the copy as a new job.

Reviewing your inventory of sensitive data discovery jobs

The Jobs page on the Amazon Macie console provides information about all the sensitive data discovery jobs for your account in the current AWS Region. For each job, the table displays summary information that includes: the current status of the job; whether the job runs on a scheduled, periodic basis; and, whether the job analyzes a specific number of S3 buckets or it analyzes S3 buckets that match runtime criteria. If you choose a job in the table, the details panel displays the configuration settings and other information about the job.

To review your job inventory
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs. The Jobs page opens and displays the number of jobs in your inventory and a table of those jobs.

  3. To find a specific job more quickly, do any of the following:

    • To sort the table by a specific field, click the column heading for the field. To change the sort order, click the column heading again.

    • To show only those jobs that have a specific value for a field, place your cursor in the filter box. In the menu that appears, choose the field to use for the filter, and enter the value for the filter. Then choose Apply.

    • To hide jobs that have a specific value for a field, place your cursor in the filter box. In the menu that appears, choose the field to use for the filter, and enter the value for the filter. Then choose Apply. In the filter box, choose the equals icon ( A solid, dark gray circle ) for the filter. This changes the filter's operator from equals to not equals ( An empty, dark gray circle with a backslash ).

    • To remove a filter, choose the remove filter icon ( A circle with an X in it ) for the filter to remove.

  4. To review the configuration settings and other details for a particular job, choose the job's name in the table, and then refer to the details panel.

Reviewing configuration settings for sensitive data discovery jobs

On the Amazon Macie console, you can use the details panel on the Jobs page to review configuration settings and other information about individual sensitive data discovery jobs. For example, you can review a list of the S3 buckets that a job is configured to analyze and which managed data identifiers a job uses to analyze objects in those buckets.

Note

You can’t change any configuration settings for an existing job. This helps ensure that you have an immutable history of sensitive data findings and discovery results for data privacy and protection audits or investigations that you perform. If you want to change an existing job, cancel the job. Then copy the job, configure the copy to use the settings that you want, and save the copy as a new job.

If you do this, you should also take steps to ensure that the new job doesn't analyze existing data in the same way again. To do this, note the date and time when you cancel the existing job. Then configure the scope of the new job to include only those objects that are created or changed after you cancel the original job. For example, use object criteria to add a Last modified exclude condition that specifies the date and time when you cancelled the original job.

To review a job's configuration settings
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs.

  3. On the Jobs page, choose the name of the job whose settings you want to review. The details panel displays the configuration settings and other information about the job. Depending on the job's settings, the panel contains the following sections.

    General information

    This section provides general information about the job—for example, the Amazon Resource Name (ARN) of the job, when the job most recently started to run, and the current status of the job. If you paused the job, this section also indicates when you paused the job, and when the job or latest job run either expired or will expire if you don't resume it.

    Statistics

    This section shows processing statistics for the job—for example, the number of times that the job has run and the approximate number of objects that the job has yet to process during its current run.

    Scope

    This section indicates how often the job runs. It also shows settings that refine the job's scope—for example, the sampling depth and any object criteria that include or exclude S3 objects from the job's analysis.

    S3 buckets

    This section appears in the panel if the job is configured to analyze buckets that you explicitly selected when you created the job. It indicates the number of AWS accounts that the job is configured to analyze data for. It also indicates the number of buckets that the job is configured to analyze and the names of those buckets (grouped by account).

    To show the complete list of accounts and buckets in JSON format, choose the number in the Total buckets field.

    S3 bucket criteria

    This section appears in the panel if the job uses runtime criteria to determine which buckets to analyze. It lists the criteria that the job is configured to use.

    To show the criteria in JSON format, choose Details, and then choose the Criteria tab in the window that appears.

    To review a table of buckets that currently match the criteria, choose Details, and then choose the Matching buckets tab in the window that appears. Optionally choose refresh ( The refresh button, which is a button that contains an empty, dark gray circle with an arrow ) to retrieve the latest data.

    Tip

    If the job has already run, you can also determine whether any buckets matched the criteria when the job ran and, if so, the names of those buckets. To do this, review log events for the job: choose Show results at the top of the panel, and then choose Show CloudWatch logs. Macie opens the Amazon CloudWatch console and displays a table of log events for the job. The events include a BUCKET_MATCHED_THE_CRITERIA event for each bucket that matched the criteria and was included in the job's analysis. For more information, see Monitoring jobs.

    Custom data identifiers

    This section appears in the panel if the job is configured to use one or more custom data identifiers. It specifies the names of those custom data identifiers.

    Allow lists

    This section appears in the panel if the job is configured to use one or more allow lists. It specifies the names of those lists. To review the settings and status of a list, choose the link icon ( A blue box with an arrow ) next to the list's name.

    Managed data identifiers

    This section indicates which managed data identifiers the job is configured to use. This is determined by the managed data identifier selection type for the job:

    • Recommended – Use the managed data identifiers that are in the recommended set when the job runs.

    • Include selected – Use only the managed data identifiers listed in the Selections section.

    • Include all – Use all the managed data identifiers that are available when the job runs.

    • Exclude selected – Use all the managed data identifiers that are available when the job runs, except the ones listed in the Selections section.

    • Exclude all – Don't use any managed data identifiers. Use only the specified custom data identifiers.

    To review these settings in JSON format, choose Details.

    Tags

    This section appears in the panel if tags are associated with the job. It lists those tags.

    tag is a label that you define and assign to certain types of AWS resources. Each tag consists of a required tag key and an optional tag value. Tags can help you identify, categorize, and manage resources in different ways, such as by purpose, owner, environment, or other criteria. To learn more, see Tagging Amazon Macie resources.

  4. To review and save the job's settings in JSON format, choose the unique identifier for the job (Job ID) at the top of the panel, and then choose Download.

Checking the status of sensitive data discovery jobs

When you create a sensitive data discovery job, its initial status is Active (Running) or Active (Idle), depending on the job's type and schedule. The job then passes through additional states, which you can monitor as the job progresses.

Tip

In addition to monitoring the overall status of a job, you can monitor specific events that occur as a job progresses. You can do this by using logging data that Macie automatically publishes to Amazon CloudWatch Logs. The data in these logs provides a record of changes to a job's status and details about any account- or bucket-level errors that occur while a job runs. For more information, see Monitoring jobs.

To check the status of a job
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs.

  3. On the Jobs page, locate the job whose status you want to check. The Status field indicates the current status of the job.

    Active (Idle)

    For a periodic job, the previous run is complete and the next scheduled run is pending. This value doesn't apply to one-time jobs.

    Active (Running)

    For a one-time job, the job is currently in progress. For a periodic job, a scheduled run is in progress.

    Cancelled

    For any type of job, the job was stopped permanently (cancelled).

    A job has this status if you explicitly cancelled it or, if it's a one-time job, you paused the job and didn't resume it within 30 days. A job can also have this status if you previously suspended Macie in the current AWS Region.

    Complete

    For a one-time job, the job ran successfully and is now complete. This value doesn't apply to periodic jobs. Instead, the status of a periodic job changes to Active (Idle) when each run completes successfully.

    Paused (By Macie)

    For any type of job, the job was stopped temporarily (paused) by Macie.

    A job has this status if completion of the job or a job run would exceed the monthly sensitive data discovery quota for your account. When this happens, Macie automatically pauses the job. Macie automatically resumes the job when the next calendar month starts (and the monthly quota is reset for your account) or you increase the quota for your account.

    If you’re the Macie administrator for an organization and you configured the job to analyze data for member accounts, the job can also have this status if completion of the job or a job run would exceed the monthly sensitive data discovery quota for a member account.

    If a job is running and the analysis of eligible objects reaches this quota for a member account, the job stops analyzing objects that are owned by the account. When the job finishes analyzing objects for all other accounts that haven’t met the quota, Macie automatically pauses the job. If it’s a one-time job, Macie automatically resumes the job when the next calendar month starts or the quota is increased for all the affected accounts, whichever occurs first. If it’s a periodic job, Macie automatically resumes the job when the next run is scheduled to start or the next calendar month starts, whichever occurs first. If a scheduled run starts before the next calendar month starts or the quota is increased for an affected account, the job doesn’t analyze objects that are owned by the account.

    Paused (By user)

    For any type of job, the job was stopped temporarily (paused) by you.

    If you pause a one-time job and you don't resume it within 30 days, the job expires and Macie cancels it. If you pause a periodic job while it's actively running and you don't resume it within 30 days, the job's run expires and Macie cancels the run. To check the expiration date for a paused job or job run, choose the job's name in the table, and then refer to the Expires field in the Status details section of the details panel.

If a job is cancelled or paused, you can refer to the job's details to determine whether the job started to run or, for a periodic job, ran at least once before it was cancelled or paused. To do this, choose the job's name in the table, and then refer to the details panel. In the panel, the Number of runs field indicates the number of times that the job has run. The Last run time field indicates the most recent date and time when the job started to run.

Depending on the job’s current status, you can optionally pause, resume, or cancel the job.

Pausing, resuming, or cancelling sensitive data discovery jobs

After you create a sensitive data discovery job, you can pause it temporarily or cancel it permanently. When you pause a job that's actively running, Macie immediately begins to pause all processing tasks for the job. When you cancel a job that's actively running, Macie immediately begins to stop all processing tasks for the job. You can’t resume or restart a job after it’s cancelled.

If you pause a one-time job, you can resume it within 30 days. When you resume the job, Macie immediately resumes processing from the point where you paused the job—Macie doesn't restart the job from the beginning. If you don't resume a one-time job within 30 days of pausing it, the job expires and Macie cancels it.

If you pause a periodic job, you can resume it at any time. If you resume a periodic job and the job was idle when you paused it, Macie resumes the job according to the schedule and other configuration settings that you chose when you created the job. If you resume a periodic job and the job was actively running when you paused it, how Macie resumes the job depends on when you resume the job:

  • If you resume the job within 30 days of pausing it, Macie immediately resumes the latest scheduled run from the point where you paused the job—Macie doesn't restart the run from the beginning.

  • If you don't resume the job within 30 days of pausing it, the latest scheduled run expires and Macie cancels all remaining processing tasks for the run. When you subsequently resume the job, Macie resumes the job according to the schedule and other configuration settings that you chose when you created the job.

To help you determine when a paused job or job run will expire, Macie adds an expiration date to the job’s details while the job is paused. To check this date, choose the job’s name in the table on the Jobs page, and then refer to the Expires field in the Status details section of the details panel. In addition, we notify you approximately seven days before the job or job run will expire. We notify you again when the job or job run expires and is cancelled. To notify you, we send email to the address that's associated with your AWS account. We also create AWS Health events and Amazon CloudWatch Events for your account.

To pause, resume, or cancel a job
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs.

  3. On the Jobs page, select the check box for the job that you want to pause, resume, or cancel, and then do one of the following on the Actions menu:

    • To pause the job temporarily, choose Pause. This option is available only if the job's current status is Active (Idle), Active (Running), or Paused (By Macie).

    • To resume the job, choose Resume. This option is available only if the job's current status is Paused (By user).

    • To cancel the job permanently, choose Cancel. If you choose this option, you can't subsequently resume or restart the job.

Copying sensitive data discovery jobs

To quickly create a new sensitive data discovery job that's similar to an existing job, you can create a copy of the job, edit the copy's settings, and then save the copy as a new job. This can be helpful for cases where you want to create a custom variation of an existing job. Or you want to adjust the configuration settings for an existing job by cancelling the job, and then copying, changing, and saving the settings as a new job.

To copy a job
  1. Open the Amazon Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs.

  3. Select the check box for the job that you want to copy.

  4. On the Actions menu, choose Copy to new.

  5. Complete the steps on the console to review and adjust the settings for the copy of the job. For the Refine the scope step, consider choosing options that prevent the job from analyzing existing data in the same way again:

    • For a one-time job, use object criteria to include only those objects that were created or changed after a certain time. For example, if you're creating a copy of a job that you cancelled, add a Last modified condition that specifies the date and time when you cancelled the existing job.

    • For a periodic job, clear the Include existing objects check box. If you do this, the first run of the job analyzes only those objects that are created or changed after you create the job and before the job's first run. You can also use object criteria to exclude objects that were last modified before a certain date and time.

    For additional details about this and other steps, see Creating a sensitive data discovery job.

  6. When you finish, choose Submit to save the copy as a new job.