Amazon EMR
Management Guide

What Tools are Available for Troubleshooting?

There are several tools you can use to gather information about your cluster to help determine what went wrong. Some require that you initialize them when you launch the cluster; others are available for every cluster.

Tools to Display Cluster Details

You can use any of the Amazon EMR interfaces (console, CLI or API) to retrieve detailed information about a cluster. For more information, see View Cluster Details.

Amazon EMR Console Details Pane

The console displays all of the clusters you've launched in the past two weeks, regardless of whether they are active or terminated. If you click on a cluster, the console displays a details pane with information about that cluster.

Amazon EMR Command Line Interface

You can locate details about a cluster from the CLI using the --describe argument.

Amazon EMR API

You can locate details about a cluster from the API using the DescribeJobFlows action.

Tools to View Log Files

Amazon EMR and Hadoop both generate log files as the cluster runs. You can access these log files from several different tools, depending on the configuration you specified when you launched the cluster. For more information, see Configure Cluster Logging and Debugging.

Log Files on the Master Node

Every cluster publishes logs files to the /mnt/var/log/ directory on the master node. These log files are only available while the cluster is running.

Log Files Archived to Amazon S3

If you launch the cluster and specify an Amazon S3 log path, the cluster copies the log files stored in /mnt/var/log/ on the master node to Amazon S3 in 5-minute intervals. This ensures that you have access to the log files even after the cluster is terminated. Because the files are archived in 5-minute intervals, the last few minutes of an suddenly terminated cluster may not be available.

Tools to Monitor Cluster Performance

Amazon EMR provides several tools to monitor the performance of your cluster.

Hadoop Web Interfaces

Every cluster publishes a set of web interfaces on the master node that contain information about the cluster. You can access these web pages by using an SSH tunnel to connect them on the master node. For more information, see View Web Interfaces Hosted on Amazon EMR Clusters.

CloudWatch Metrics

Every cluster reports metrics to CloudWatch. CloudWatch is a web service that tracks metrics, and which you can use to set alarms on those metrics. For more information, see Monitor Metrics with CloudWatch.