AWS Glue job run statuses on the console - AWS Glue

AWS Glue job run statuses on the console

You can view the status of an AWS Glue extract, transform, and load (ETL) job while it is running or after it has stopped. You can view the status using the AWS Glue console. For more information about job run statuses, see AWS Glue job run statuses.

Accessing the job monitoring dashboard

You access the job monitoring dashboard by choosing the Job run monitoring link in the AWS Glue navigation pane under ETL jobs.

Overview of the job monitoring dashboard

The job monitoring dashboard provides an overall summary of the job runs, with totals for the jobs with a status of Running, Canceled, Success, or Failed. Additional tiles provide the overall job run success rate, the estimated DPU usage for jobs, a breakdown of the job status counts by job type, worker type, and by day.

The graphs in the tiles are interactive. You can choose any block in a graph to run a filter that displays only those jobs in the Job runs table at the bottom of the page.

You can change the date range for the information displayed on this page by using the Date range selector. When you change the date range, the information tiles adjust to show the values for the specified number of days before the current date. You can also use a specific date range if you choose Custom from the date range selector.

Job runs view

Note

Job run history is accessible for 90 days for your workflow and job run.

The Job runs resource list shows the jobs for the specified date range and filters.

You can filter the jobs on additional criteria, such as status, worker type, job type, and the job name. In the filter box at the top of the table, you can enter the text to use as a filter. The table results are updated with rows that contain matching text as you enter the text.

You can view a subset of the jobs by choosing elements from the graphs on the job monitoring dashboard. For example, if you choose the number of running jobs in the Job runs summary tile, then the Job runs list displays only the jobs that currently have a status of Running. If you choose one of the bars in the Worker type breakdown bar chart, then only job runs with the matching worker type and status are shown in the Job runs list.

The Job runs resource list displays the details for the job runs. You can sort the rows in the table by choosing a column heading. The table contains the following information:

Property Description
Job name The name of the job.
Type

The type of job environment:

  • Glue ETL: Runs in an Apache Spark environment managed by AWS Glue.

  • Glue Streaming: Runs in an Apache Spark environment and performs ETL on data streams.

  • Python shell: Runs Python scripts as a shell.

Start time

The date and time at which this job run was started.

End time

The date and time that this job run completed.

Run status

The current state of the job run. Values can be:

  • STARTING

  • RUNNING

  • STOPPING

  • STOPPED

  • SUCCEEDED

  • FAILED

  • TIMEOUT

Run time The amount of time that the job run consumed resources.
Capacity

The number of AWS Glue data processing units (DPUs) that were allocated for this job run. For more information about capacity planning, see Monitoring for DPU Capacity Planning in the AWS Glue Developer Guide.

Worker type

The type of predefined worker that was allocated when the job ran. Values can be G.1X, G.2X, G.4X or G.8X.

  • G.1X – When you choose this type, you also provide a value for Number of workers. Each worker maps to 1 DPU (4 vCPUs, 16 GB of memory) with 84GB disk (approximately 34GB free). We recommend this worker type for memory-intensive jobs. This is the default Worker type for AWS Glue Version 2.0 or later jobs.

  • G.2X – When you choose this type, you also provide a value for Number of workers. Each worker maps to 2 DPU (8 vCPUs, 32 GB of memory) with 128GB disk (approximately 77GB free). We recommend this worker type for memory-intensive jobs and jobs that run machine learning transforms.

  • G.4X – When you choose this type, you also provide a value for Number of workers. Each worker maps to 4 DPU (16 vCPUs, 64 GB of memory) with 256GB disk (approximately 235GB free). We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for AWS Glue version 3.0 or later Spark ETL jobs in the following AWS Regions: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

  • G.8X – When you choose this type, you also provide a value for Number of workers. Each worker maps to 8 DPU (32 vCPUs, 128 GB of memory) with 512GB disk (approximately 487GB free). We recommend this worker type for jobs whose workloads contain your most demanding transforms, aggregations, joins, and queries. This worker type is available only for AWS Glue version 3.0 or later Spark ETL jobs, in the same AWS Regions as supported for the G.4X worker type.

DPU hours

The estimated number of DPUs used for the job run. A DPU is a relative measure of processing power. DPUs are used to determine the cost of running your job. For more information, see the AWS Glue pricing page.

You can choose any job run in the list and view additional information. Choose a job run, and then do one of the following:

  • Choose the Actions menu and the View job option to view the job in the visual editor.

  • Choose the Actions menu and the Stop run option to stop the current run of the job.

  • Choose the View CloudWatch logs button to view the job run logs for that job.

  • Choose View details to view the job run details page.

Viewing the job run logs

You can view the job logs in a variety of ways:

  • On the Monitoring page, in the Job runs table, choose a job run, and then choose View CloudWatch logs.

  • In the visual job editor, on the Runs tab for a job, choose the hyperlinks to view the logs:

    • Logs – Links to the Apache Spark job logs written when continuous logging is enabled for a job run. When you choose this link, it takes you to the Amazon CloudWatch logs in the /aws-glue/jobs/logs-v2 log group. By default, the logs exclude non-useful Apache Hadoop YARN heartbeat and Apache Spark driver or executor log messages. For more information about continuous logging, see Continuous Logging for AWS Glue Jobs in the AWS Glue Developer Guide.

    • Error logs – Links to the logs written to stderr for this job run. When you choose this link, it takes you to the Amazon CloudWatch logs in the /aws-glue/jobs/error log group. You can use these logs to view details about any errors that were encountered during the job run.

    • Output logs – Links to the logs written to stdout for this job run. When you choose this link, it takes you to the Amazon CloudWatch logs in the /aws-glue/jobs/output log group. You can use these logs to see all the details about the tables that were created in the AWS Glue Data Catalog and any errors that were encountered.

Viewing the details of a job run

You can choose a job in the Job runs list on the Monitoring page, and then choose View run details to see detailed information for that run of the job.

The information displayed on the job run detail page includes:

Property Description
Job name The name of the job.
Run Status

The current state of the job run. Values can be:

  • STARTING

  • RUNNING

  • STOPPING

  • STOPPED

  • SUCCEEDED

  • FAILED

  • TIMEOUT

Glue version The AWS Glue version used by the job run.
Recent attempt The number of automatic retry attempts for this job run.
Start time

The date and time at which this job run was started.

End time

The date and time that this job run completed.

Start-up time

The amount of time spent preparing to run the job.

Execution time

The amount of time spent running the job script.

Trigger name

The name of the trigger associated with the job.

Last modified on

The date when the job was last modified.

Security configuration

The security configuration for the job, which includes Amazon S3 encryption, CloudWatch encryption, and job bookmarks encryption settings.

Timeout The job run timeout threshold value.
Allocated capacity

The number of AWS Glue data processing units (DPUs) that were allocated for this job run. For more information about capacity planning, see Monitoring for DPU Capacity Planning in the AWS Glue Developer Guide.

Max capacity

The maximum capacity available to the job run.

Number of workers The number of workers used for the job run.
Worker type

The type of predefined workers allocated for the job run. Values can be G.1X or G.2X.

  • G.1X – When you choose this type, you also provide a value for Number of workers. Each worker maps to 1 DPU (4 vCPUs, 16 GB of memory, 64 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs. This is the default Worker type for AWS Glue Version 2.0 or later jobs.

  • G.2X – When you choose this type, you also provide a value for Number of workers. Each worker maps to 2 DPUs (8 vCPUs, 32 GB of memory, 128 GB disk), and provides 1 executor per worker. We recommend this worker type for memory-intensive jobs and jobs that run machine learning transforms.

Logs A link to the job logs for continuous logging (/aws-glue/jobs/logs-v2).
Output Logs A link to the job output log files (/aws-glue/jobs/output).
Error logs A link to the job error log files (/aws-glue/jobs/error).

You can also view the following additional items, which are available when you view information for recent job runs. For more information, see View information for recent job runs.

Viewing Amazon CloudWatch metrics for a Spark job run

On the details page for a job run, below the Run details section, you can view the job metrics. AWS Glue Studio sends job metrics to Amazon CloudWatch for every job run.

AWS Glue reports metrics to Amazon CloudWatch every 30 seconds. The AWS Glue metrics represent delta values from the previously reported values. Where appropriate, metrics dashboards aggregate (sum) the 30-second values to obtain a value for the entire last minute. However, the Apache Spark metrics that AWS Glue passes on to Amazon CloudWatch are generally absolute values that represent the current state at the time they are reported.

Note

You must configure your account to access Amazon CloudWatch, .

The metrics provide information about your job run, such as:

  • ETL Data Movement – The number of bytes read from or written to Amazon S3.

  • Memory Profile: Heap used – The number of memory bytes used by the Java virtual machine (JVM) heap.

  • Memory Profile: heap usage – The fraction of memory (scale: 0–1), shown as a percentage, used by the JVM heap.

  • CPU Load – The fraction of CPU system load used (scale: 0–1), shown as a percentage.

Viewing Amazon CloudWatch metrics for a Ray job run

On the details page for a job run, below the Run details section, you can view the job metrics. AWS Glue Studio sends job metrics to Amazon CloudWatch for every job run.

AWS Glue reports metrics to Amazon CloudWatch every 30 seconds. The AWS Glue metrics represent delta values from the previously reported values. Where appropriate, metrics dashboards aggregate (sum) the 30-second values to obtain a value for the entire last minute. However, the Apache Spark metrics that AWS Glue passes on to Amazon CloudWatch are generally absolute values that represent the current state at the time they are reported.

Note

You must configure your account to access Amazon CloudWatch, as described in .

In Ray jobs, you can view the following aggregated metric graphs. With these, you can build a profile of your cluster and tasks, and can access detailed information about each node. The time-series data that back these graphs is available in CloudWatch for further analysis.

Task Profile: Task State

Shows the number of Ray tasks in the system. Each task lifecycle is given its own time series.

Task Profile: Task Name

Shows the number of Ray tasks in the system. Only pending and active tasks are shown. Each type of task (by name) is given its own time series.

Cluster Profile: CPUs in use

Shows the number of CPU cores that are used. Each node is given its own time series. Nodes are identified by IP addresses, which are ephemeral and only used for identification.

Cluster Profile: Object store memory use

Shows memory use by the Ray object cache. Each memory location (physical memory, cached on disk, and spilled in Amazon S3) is given its own time series. The object store manages data storage across all nodes in the cluster. For more information, see Objects in the Ray documentation.

Cluster Profile: Node count

Shows the number of nodes provisioned for the cluster.

Node Detail: CPU use

Shows CPU utilization on each node as a percentage. Each series shows an aggregated percentage of CPU usage across all cores on the node.

Node Detail: Memory use

Shows memory use on each node in GB. Each series shows memory aggregated between all processes on the node, including Ray tasks and the Plasma store process. This will not reflect objects stored to disk or spilled to Amazon S3.

Node Detail: Disk use

Shows disk use on each node in GB.

Node Detail: Disk I/O speed

Shows disk I/O on each node in KB/s.

Node Detail: Network I/O throughput

Shows network I/O on each node in KB/s.

Node Detail: CPU use by Ray component

Shows CPU use in fractions of a core. Each ray component on each node is given its own time series.

Node Detail: Memory use by Ray component

Shows memory use in GiB. Each ray component on each node is given its own time series.