Job Monitoring and Debugging - AWS Glue

Job Monitoring and Debugging

You can collect metrics about AWS Glue jobs and visualize them on the AWS Glue and Amazon CloudWatch consoles to identify and fix issues. Profiling your AWS Glue jobs requires the following steps:

  1. Enable the Job metrics option in the job definition. You can enable profiling in the AWS Glue console or as a parameter to the job. For more information see Defining Job Properties for Spark Jobs or Special Parameters Used by AWS Glue.

  2. Confirm that the job script initializes a GlueContext. For example, the following script snippet initializes a GlueContext and shows where profiled code is placed in the script. This general format is used in the debugging scenarios that follow.

    import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job import time ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) ... ... code-to-profile ... ... job.commit()
  3. Run the job.

  4. Visualize the metrics on the AWS Glue console and identify abnormal metrics for the driver or an executor.

  5. Narrow down the root cause using the identified metric.

  6. Optionally, confirm the root cause using the log stream of the identified driver or job executor.