Managing sensitive data discovery jobs - Amazon Macie

Managing sensitive data discovery jobs

To help you manage your sensitive data discovery jobs, Amazon Macie provides a complete inventory of your jobs in each AWS Region. With this inventory, you can manage your jobs as a single collection, and access the configuration settings, status, and processing statistics for individual jobs. You can also access the sensitive data findings and other results that each job produced.

In addition to these tasks, you can create custom variations of individual jobs—copy an existing job, adjust the settings for the copy, and then save the copy as a new job. This can be helpful for cases where you want to analyze different sets of data in the same way, or the same set of data in different ways. Or you want to adjust the configuration settings for an existing job—cancel the existing job, copy it, and then adjust and save the copy as a new job.

Viewing your inventory of sensitive data discovery jobs

The Jobs page on the Amazon Macie console provides information about all the sensitive data discovery jobs for your account in the current AWS Region. For each job, the table displays summary information that includes: the current status of the job; whether the job runs on a scheduled, periodic basis; and whether the job analyzes a specific number of S3 buckets or it analyzes S3 buckets that match run-time criteria. If you choose a job in the table, the details panel displays the configuration settings and other information about the job.

To view your job inventory

  1. Open the Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs. The Jobs page opens and displays the number of jobs in your inventory and a table of those jobs.

  3. To find a specific job more quickly, do any of the following:

    • To sort the table by a specific field, click the column heading for the field. To change the sort order, click the column heading again.

    • To show only those jobs that have a specific value for a field, place your cursor in the filter bar. In the menu that appears, choose the field to use for the filter, and enter the value for the filter. Then choose Apply.

    • To hide jobs that have a specific value for a field, place your cursor in the filter bar. In the menu that appears, choose the field to use for the filter, and enter the value for the filter. Then choose Apply. In the filter bar, choose the equals icon ( A solid, dark gray circle ) in the filter box. This changes the filter's operator from equals to not equals ( An empty, dark gray circle with a backslash ).

    • To remove a filter, choose the remove filter icon ( A circle with an X in it ) in the filter box for the filter to remove.

  4. To review the configuration settings and other details for a particular job, choose the job's name in the table, and then refer to the details panel.

Viewing configuration settings for sensitive data discovery jobs

On the Amazon Macie console, you can use the details panel on the Jobs page to view configuration settings and other information about an individual sensitive data discovery job. For example, you can view a list of the S3 buckets that a job is configured to analyze, and whether and which custom data identifiers a job uses to analyze data in those buckets.

Note that you can’t change any settings for an existing job. This helps ensure that you have an immutable history of sensitive data findings and discovery results for data privacy and protection audits or investigations that you perform.

If you want to change an existing job, you can cancel the job. Then copy the job, configure the copy to use the settings that you want, and save the copy as a new job. If you do this, you should take steps to ensure that the new job doesn't analyze existing data in the same way again. To do this, note the date and time when you cancel the existing job. Then configure the scope of the new job to include only those objects that are created or changed after you cancel the original job. For example, use object criteria to add a Last modified exclude condition that specifies the date and time when you cancelled the original job. For more information, see Scope options for jobs.

To view a job's configuration settings

  1. Open the Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs.

  3. On the Jobs page, choose the name of the job whose settings you want to view. The details panel displays the configuration settings and other information about the job.

    Depending on the job's settings, the panel contains the following sections:

    • General information – This section indicates the current status of the job and it provides general information about the job—for example, the Amazon Resource Name (ARN) of the job and the most recent date and time when the job started to run. If you paused the job during the past 30 days, this section also indicates when you paused the job and when the job or job run will expire if you don't resume it.

    • Statistics – This section shows processing statistics for the job—for example, the number of times that the job has run and the approximate number of objects that the job has yet to process during its current run.

    • Scope – This section indicates how often the job runs. It also shows the settings that refine the scope of the job—for example, the sampling depth and any object criteria that include or exclude S3 objects from the job's analysis.

    • S3 buckets – This section appears in the panel if the job is configured to analyze buckets that you explicitly selected when you created the job. It indicates the number of AWS accounts that the job is configured to analyze data for. It also indicates the number of buckets that the job is configured to analyze, and the names of those buckets (grouped by account). To show the complete list of accounts and buckets in JSON format, choose the number in the Total buckets field.

    • S3 bucket criteria – This section appears in the panel if the job uses run-time criteria to determine which buckets to analyze. It lists any inclusion and exclusion criteria that the job is configured to use. To review the criteria in JSON format, choose Details, and then choose the Criteria tab in the window that appears.

      Tip

      To review a table of buckets that currently match the criteria, choose Details, and then choose the Matching buckets tab in the window that appears. Optionally choose refresh ( The refresh button, which is a button that contains an empty, dark gray circle with an arrow ) to retrieve the latest data.

      If the job has already run, you can also determine whether any buckets matched the criteria when the job ran and, if so, the names of those buckets. You can do this by reviewing the job status log events for the job. To do this, choose Show results at the top of the panel, and then choose Show CloudWatch logs. Macie opens the Amazon CloudWatch console and displays a table of log events for the job, including a BUCKET_MATCHED_THE_CRITERIA event for each bucket that matched the criteria and was included in the job's analysis.

    • Custom data identifiers – This section appears in the panel if the job is configured to use custom data identifiers to analyze data. It lists the names of those custom data identifiers.

  4. (Optional) To view and save the job's settings in JSON format, choose the unique identifier for the job (Job ID) at the top of the panel, and then choose Download.

Checking the status of sensitive data discovery jobs

When you create a sensitive data discovery job, its initial status is Active (Running) or Active (Idle), depending on the job's type and schedule. The job then passes through additional states, which you can monitor as the job progresses.

Tip

In addition to monitoring the overall status of a job, you can monitor specific events that occur as a job progresses. You can do this by using logging data that Macie automatically publishes to Amazon CloudWatch Logs. The data in these logs provides a record of changes to a job's status and details about any account- or bucket-level errors that occur while a job runs. For more information, see Monitoring jobs.

To check the status of a job

  1. Open the Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs.

  3. On the Jobs page, locate the job whose status you want to check. The Status field indicates the current status of the job:

    • Active (Idle) – For a periodic job, the previous run is complete and the next scheduled run is pending. This value doesn't apply to one-time jobs.

    • Active (Running) – For a one-time job, the job is currently in progress. For a periodic job, a scheduled run is in progress.

    • Cancelled – For any type of job, the job was stopped permanently (cancelled). A job has this status if you explicitly cancelled it or, if it's a one-time job, you paused the job and didn't resume it within 30 days. A job can also have this status if you suspended Macie in the current AWS Region.

    • Complete – For a one-time job, the job ran successfully and is now complete. This value doesn't apply to periodic jobs. Instead, the status of a periodic job changes to Active (Idle) when each run completes successfully.

    • Paused (By Macie) – For any type of job, the job was stopped temporarily (paused) by Macie.

      A job has this status if completion of the job or a job run would exceed the monthly sensitive data discovery quota for your account or any member accounts that the job analyzes data for. When this happens, Macie automatically pauses the job. Macie automatically resumes the job when the subsequent month starts or the quota is increased for all the affected accounts.

    • Paused (By user) – For any type of job, the job was stopped temporarily (paused) by you.

      If you pause a one-time job and you don't resume it within 30 days, the job expires and Macie cancels it. If you pause a periodic job while it's actively running and you don't resume it within 30 days, the job's run expires and Macie cancels the run. To check the expiration date for a paused job or job run, choose the job's name in the table, and then refer to the Expires field in the Status details section of the details panel.

If a job is cancelled or paused, you can refer to the job's details to determine whether the job started to run or, for a periodic job, ran at least once before it was cancelled or paused. To do this, choose the job's name in the table, and then refer to the details panel. In the panel, the Number of runs field indicates the number of times that the job has run. The Last run time field indicates the most recent date and time when the job started to run.

Depending on the job’s current status, you can optionally pause, resume, or cancel the job.

Pausing, resuming, or cancelling sensitive data discovery jobs

After you create a sensitive data discovery job, you can pause it temporarily or cancel it permanently. When you pause a job that's actively running, Macie immediately begins to pause all processing tasks for the job. When you cancel a job that's actively running, Macie immediately begins to stop all processing tasks for the job. You can’t resume or restart a job after it’s cancelled.

If you pause a one-time job, you can resume it within 30 days. When you resume the job, Macie immediately resumes processing from the point where you paused the job—Macie doesn't restart the job from the beginning. If you don't resume a one-time job within 30 days of pausing it, the job expires and Macie cancels it.

If you pause a periodic job, you can resume it at any time. If you resume a periodic job and the job was idle when you paused it, Macie resumes the job according to the schedule and other configuration settings that you chose when you created the job. If you resume a periodic job and the job was actively running when you paused it, how Macie resumes the job depends on when you resume the job:

  • If you resume the job within 30 days of pausing it, Macie immediately resumes the latest scheduled run from the point where you paused the job—Macie doesn't restart the run from the beginning.

  • If you don't resume the job within 30 days of pausing it, the latest scheduled run expires and Macie cancels all remaining processing tasks for the run. When you subsequently resume the job, Macie resumes the job according to the schedule and other configuration settings that you chose when you created the job.

To help you determine when a paused job or job run will expire, Macie adds an expiration date to the job’s details while the job is paused. To check this date, choose the job’s name in the table on the Jobs page, and then refer to the Expires field in the Status details section of the details panel. In addition, we notify you approximately seven days before the job or job run will expire. We notify you again when the job or job run expires and is cancelled. To notify you, we send email to the address that's associated with your Amazon Web Services account. We also create AWS Health events and Amazon CloudWatch Events for your account.

To pause, resume, or cancel a job

  1. Open the Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs.

  3. On the Jobs page, select the check box for the job that you want to pause, resume, or cancel, and then do one of the following on the Actions menu:

    • To pause the job temporarily, choose Pause. This option is available only if the job's current status is Active (Idle), Active (Running), or Paused (By Macie).

    • To resume the job, choose Resume. This option is available only if the job's current status is Paused (By user).

    • To cancel the job permanently, choose Cancel. If you choose this option, you can't subsequently resume or restart the job.

Copying sensitive data discovery jobs

To quickly create a new sensitive data discovery job that's similar to an existing job, you can create a copy of the job, edit the copy's settings, and then save the copy as a new job. This can be helpful for cases where you want to create a custom variation of an existing job. Or you want to adjust the configuration settings for an existing job by cancelling the job, and then copying, changing, and saving the settings as a new job.

To copy a job

  1. Open the Macie console at https://console.aws.amazon.com/macie/.

  2. In the navigation pane, choose Jobs.

  3. Select the check box for the job that you want to copy.

  4. On the Actions menu, choose Copy to new.

  5. Complete the steps on the console to review and adjust the settings for the copy of the job. On the Scope page, consider choosing options that prevent the job from analyzing existing data in the same way again:

    • For a one-time job, use object criteria to include only those objects that were created or changed after a certain time. For example, if you're creating a copy of a job that you cancelled, add a Last modified condition that specifies the date and time when you cancelled the existing job.

    • For a periodic job, clear the Include existing objects check box. If you do this, the first run of the job analyzes only those objects that are created or changed after you create the job and before the job's first run. You can also use object criteria to exclude objects that were last modified before a certain date and time.

  6. When you finish, choose Submit to save the copy as a new job.