Amazon EMR
Management Guide

Access Spark History Server UI from the Console

With Amazon EMR versions 5.25.0 and later, you can connect to Spark history server UI from the cluster Summary page or the Application history tab in the console without setting up a web proxy through an SSH connection. Accessing Spark history server UI from the console provides the following benefits:

  • You can quickly analyze and troubleshoot active jobs and job history by viewing the details of Spark execution history and accessing relevant log files.

  • You can access the Spark history and debug even after the cluster is terminated. The logs are available for active clusters and are retained for 30 days after the cluster is terminated.

If you use a private subnet for your cluster, make sure to include “arn:aws:s3:::prod.MyRegion.appinfo.src/*” in the resource list of the Amazon S3 policy for the private subnet. For more information, see Minimum Amazon S3 Policy for Private Subnet.

To access YARN container logs from the Spark history server UI, you must enable logging to Amazon S3 for your cluster. For more information, see Configure Cluster Logging and Debugging.

Event Logs Collection

Amazon EMR collects Spark event logs into an EMR system bucket to enable Spark history server UI access from the console. The event logs are encrypted at rest using Server-Side Encryption with Amazon S3 Managed Keys (SSE-S3). If you need to disable this feature for privacy reasons, you can stop the daemon by using a bootstrap script when you create a cluster, as the following example demonstrates.

aws emr create-cluster --name "Stop SparkUI Support" --release-label emr-5.28.0 --applications Name=Hadoop Name=Spark --ec2-attributes KeyName=keyname --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=TASK,InstanceCount=1,InstanceType=m3.xlarge --use-default-roles --bootstrap-actions Path=s3://elasticmapreduce/bootstrap-actions/run-if,Args=["instance.isMaster=true","echo Stop Spark UI | sudo tee /etc/apppusher/run-apppusher"]

After you run this bootstrap script, Amazon EMR will not collect any Spark event logs into the EMR system bucket. No application history information will be available on the Application history tab, and you will lose access to the Spark history server UI from the console.

Considerations and Limitations

This feature currently has the following limitations:

  • Accessing Spark history server UI from the console is currently not available for EMR clusters with multiple master nodes or for EMR clusters integrated with AWS Lake Formation.

  • To access Spark history server UI from the console, you must have permission to the ListSteps action for EMR. If you deny an IAM principal's permission to this action, it takes approximately five minutes for the permission change to propagate.

  • If you reconfigure Spark applications in a running cluster, the application history will be not available through the Spark history server UI.

  • For each AWS account, the number of active Spark history server UIs cannot exceed 50.

  • You can access Spark history server UI from the console in the US East (N.Virgina and Ohio), US West (N.California and Oregon), Canada (Central), EU (Frankfurt, Ireland, and London), and Asia Pacific (Mumbai, Seoul, Singapore, Sydney, and Tokyo) Regions.

Access Application History through Spark History Server UI

On the Application history tab or the cluster Summary page for your cluster in the Amazon EMR console, choose the Spark history server UI link.

The Spark history server UI opens in a new browser tab. This web interface displays the same information as the open source Spark HistoryServer UI if you set up a web proxy through an SSH connection. For more information, see Monitoring and Instrumentation.

You can view YARN container logs through the links on the Spark history server UI.

Note

To access YARN container logs from the Spark history server UI, you must enable logging to Amazon S3 for your cluster. If logging is not enabled, the links to YARN container logs will not work.