Working with jobs on the AWS Glue console
A job in AWS Glue consists of the business logic that performs extract, transform, and load (ETL) work. You can create jobs in the ETL section of the AWS Glue console.
To view existing jobs, sign in to the AWS Management Console and open the AWS Glue console at
https://console.aws.amazon.com/glue/
From the Jobs list, you can do the following:
-
To start an existing job, choose Action, and then choose Run job.
-
To stop a
Running
orStarting
job, choose Action, and then choose Stop job run. -
To add triggers that start a job, choose Action, Choose job triggers.
-
To modify an existing job, choose Action, and then choose Edit job or Delete.
-
To change a script that is associated with a job, choose Action, Edit script.
-
To reset the state information that AWS Glue stores about your job, choose Action, Reset job bookmark.
-
To create a development endpoint with the properties of this job, choose Action, Create development endpoint.
To add a new job using the console
-
Open the AWS Glue console, and choose the Jobs tab.
-
Choose Add job, and follow the instructions in the Add job wizard.
If you decide to have AWS Glue generate a script for your job, you must specify the job properties, data sources, and data targets, and verify the schema mapping of source columns to target columns. The generated script is a starting point for you to add code to perform your ETL work. Verify the code in the script and modify it to meet your business needs.
Note To get step-by-step guidance for adding a job with a generated script, see the Add job tutorial in the console.
Optionally, you can add a security configuration to a job to specify at-rest encryption options.
If you provide or author the script, your job defines the sources, targets, and transforms. But you must specify any connections that are required by the script in the job. For information about creating your own script, see Providing your own custom scripts.
The job assumes the permissions of the IAM role that you specify when you create it. This IAM role must have permission to extract data from your data source and write to your target. The AWS Glue console only lists IAM roles that have attached a trust policy for the AWS Glue principal service. For more information about providing roles for AWS Glue, see Identity-based policies for AWS Glue.
If the job reads AWS KMS encrypted Amazon Simple Storage Service (Amazon S3) data, then the IAM role must have decrypt permission on the KMS key. For more information, see Step 2: Create an IAM role for AWS Glue.
Check Troubleshooting errors in AWS Glue for known problems when a job runs.
To learn about the properties that are required for each job, see Defining job properties for Spark jobs.
To get step-by-step guidance for adding a job with a generated script, see the Add job tutorial in the AWS Glue console.
Viewing job details
To see details of a job, select the job in the Jobs list and review the information on the following tabs:
-
History
-
Details
-
Script
-
Metrics
History
The History tab shows your job run history and how successful a job has been in the past. For each job, the run metrics include the following:
-
Run ID is an identifier created by AWS Glue for each run of this job.
-
Retry attempt shows the number of attempts for jobs that required AWS Glue to automatically retry.
-
Run status shows the success of each run listed with the most recent run at the top. If a job is
Running
orStarting
, you can choose the action icon in this column to stop it. -
Error shows the details of an error message if the run was not successful.
-
Logs links to the logs written to
stdout
for this job run.The Logs link takes you to Amazon CloudWatch Logs, where you can see all the details about the tables that were created in the AWS Glue Data Catalog and any errors that were encountered. You can manage your log retention period in the CloudWatch console. The default log retention is
Never Expire
. For more information about how to change the retention period, see Change Log Data Retention in CloudWatch Logs in the Amazon CloudWatch Logs User Guide. -
Error logs links to the logs written to
stderr
for this job run.This link takes you to CloudWatch Logs, where you can see details about any errors that were encountered. You can manage your log retention period on the CloudWatch console. The default log retention is
Never Expire
. For more information about how to change the retention period, see Change log data retention in CloudWatch logs in the Amazon CloudWatch Logs User Guide. -
Execution time shows the length of time during which the job run consumed resources. The amount is calculated from when the job run starts consuming resources until it finishes.
-
Timeout shows the maximum execution time during which this job run can consume resources before it stops and goes into timeout status.
-
Delay shows the threshold before sending a job delay notification. When a job run execution time reaches this threshold, AWS Glue sends a notification ("Glue Job Run Status") to CloudWatch Events.
-
Triggered by shows the trigger that fired to start this job run.
-
Start time shows the date and time (local time) that the job started.
-
End time shows the date and time (local time) that the job ended.
For a specific job run, you can View run metrics, which displays graphs of metrics for the selected job run. For more information about how to turn on metrics and interpret the graphs, see Job monitoring and debugging.
Details
The Details tab includes attributes of your job. It shows you the details about the job definition and also lists the triggers that can start this job. Each time one of the triggers in the list fires, the job is started. For the list of triggers, the details include the following:
-
Trigger name shows the names of triggers that start this job when fired.
-
Trigger type lists the type of trigger that starts this job.
-
Trigger status displays whether the trigger is created, activated, or deactivated.
-
Trigger parameters shows parameters that define when the trigger fires.
-
Jobs to trigger shows the list of jobs that start when this trigger fires.
The Details tab does not include source and target information. Review the script to see the source and target details.
Script
The Script tab shows the script that runs when your job is started. You can invoke an Edit script view from this tab. For more information about the script editor in the AWS Glue console, see Jobs (legacy). For information about the functions that are called in your script, see Program AWS Glue ETL scripts in PySpark.
Metrics
The Metrics tab shows metrics collected when a job runs and profiling is turned on. The following graphs are shown:
ETL Data Movement
Memory Profile: Driver and Executors
Choose View additional metrics to show the following graphs:
ETL Data Movement
Memory Profile: Driver and Executors
Data Shuffle Across Executors
CPU Load: Driver and Executors
Job Execution: Active Executors, Completed Stages & Maximum Needed Executors
Data for these graphs is pushed to CloudWatch metrics if the job is configured to collect metrics. For more information about how to turn on metrics and interpret the graphs, see Job monitoring and debugging.
Example ETL data movement graph
The ETL Data Movement graph shows the following metrics:
The number of bytes read from Amazon S3 by all executors—glue.ALL.s3.filesystem.read_bytes
-
The number of bytes written to Amazon S3 by all executors—glue.ALL.s3.filesystem.write_bytes

Example Memory profile graph
The Memory Profile graph shows the following metrics:
The fraction of memory used by the JVM heap for this driver (scale: 0–1) by the driver, an executor identified by executorId, or all executors—

Example Data shuffle across executors graph
The Data Shuffle Across Executors graph shows the following metrics:
The number of bytes read by all executors to shuffle data between them—glue.driver.aggregate.shuffleLocalBytesRead
-
The number of bytes written by all executors to shuffle data between them—glue.driver.aggregate.shuffleBytesWritten

Example CPU load graph
The CPU Load graph shows the following metrics:
The fraction of CPU system load used (scale: 0–1) by the driver, an executor identified by executorId, or all executors—

Example Job execution graph
The Job Execution graph shows the following metrics:
The number of actively running executors—glue.driver.ExecutorAllocationManager.executors.numberAllExecutors
The number of completed stages—glue.aggregate.numCompletedStages
The number of maximum needed executors—glue.driver.ExecutorAllocationManager.executors.numberMaxNeededExecutors
