Enabling the Apache Spark web UI for AWS Glue jobs - AWS Glue

Enabling the Apache Spark web UI for AWS Glue jobs

You can use the Apache Spark web UI to monitor and debug AWS Glue ETL jobs running on the AWS Glue job system. You can configure the Spark UI using the AWS Glue console or the AWS Command Line Interface (AWS CLI).

Every 30 seconds, AWS Glue backs up the Spark event logs to the Amazon S3 path that you specify.

Configuring the Spark UI (console)

Follow these steps to configure the Spark UI by using the AWS Management Console. When creating an AWS Glue job, Spark UI is enabled by default.

To turn on the Spark UI when you create or edit a job
  1. Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/.

  2. In the navigation pane, choose Jobs.

  3. Choose Add job, or select an existing one.

  4. In Job details, open the Advanced properties.

  5. Under the Spark UI tab, choose Write Spark UI logs to Amazon S3.

  6. Specify an Amazon S3 path for storing the Spark event logs for the job. Note that if you use a security configuration in the job, the encryption also applies to the Spark UI log file. For more information, see Encrypting data written by AWS Glue.

  7. Under Spark UI logging and monitoring configuration:

    • Select Standard if you are generating logs to view in the AWS Glue console.

    • Select Legacy if you are generating logs to view on a Spark history server.

    • You can also choose to generate both.

Configuring the Spark UI (AWS CLI)

To generate logs for viewing with Spark UI, in the AWS Glue console, use the AWS CLI to pass the following job parameters to AWS Glue jobs. For more information, see AWS Glue job parameters.

'--enable-spark-ui': 'true', '--spark-event-logs-path': 's3://s3-event-log-path'

To distribute logs to their legacy locations, set the --enable-spark-ui-legacy-path parameter to "true". If you do not want to generate logs in both formats, remove the --enable-spark-ui parameter.

Configuring the Spark UI for sessions using Notebooks

Warning

AWS Glue interactive sessions do not currently support Spark UI in the console. Configure a Spark history server.

If you use AWS Glue notebooks, set up SparkUI config before starting the session. To do this, use the %%configure cell magic:

%%configure { “--enable-spark-ui”: “true”, “--spark-event-logs-path”: “s3://path” }