| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
When you’re running a cluster, you often want to track its progress and health. Amazon EMR records metrics that can help you monitor your cluster. It makes these metrics available in the Amazon EMR console and in the Amazon CloudWatch console, where you can track them with your other AWS metrics. In Amazon CloudWatch, you can set alarms to warn you if a metric goes outside parameters you specify.
Metrics are updated every five minutes. This interval is not configurable. Metrics are archived for two weeks; after that period, the data is discarded.
These metrics are automatically collected and pushed to Amazon CloudWatch for every Amazon EMR cluster. There is no charge for the Amazon EMR metrics reported in Amazon CloudWatch; they are provided as part of the Amazon EMR service.
Note
Viewing Amazon EMR metrics in Amazon CloudWatch is supported only for clusters launched with AMI 2.0.3 or later and running Hadoop 0.20.205 or later. For more information about selecting the AMI version for your cluster, see Choose a Machine Image .
The following video walks you through the metrics that Amazon EMR provides in the Amazon EMR console.
The metrics reported by Amazon EMR provide information that you can analyze in different ways. The table below shows some common uses for the metrics. These are suggestions to get you started, not a comprehensive list. For the complete list of metrics reported by Amazon EMR, see Metrics Reported by Amazon EMR in Amazon CloudWatch.
| How do I? | Relevant Metrics |
|---|---|
| Track the progress of my cluster |
Look at the RunningMapTasks, RemainingMapTasks, RunningReduceTasks, and
RemainingReduceTasks metrics.
|
| Detect clusters that are idle |
The IsIdle metric tracks whether a cluster is live, but not currently running tasks.
You can set an alarm to fire when the cluster has been idle for a given period of time,
such as thirty minutes.
|
| Detect when a node runs out of storage |
The HDFSUtilization metric is the percentage of disk space currently used. If this rises
above an acceptable level for your application, such as 80% of capacity used, you may need to resize
your cluster and add more core nodes.
|
There are many ways to access the metrics that Amazon EMR pushes to Amazon CloudWatch. You can view them through either the Amazon EMR console or Amazon CloudWatch console, or you can retrieve them using the Amazon CloudWatch CLI or the Amazon CloudWatch API. The following procedures show you how to access the metrics using these various tools.
To view metrics in the Amazon EMR console
Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.
To view metrics for a cluster, click on the cluster to display the Job Flow Details pane.

Select the Monitoring tab to view information about that cluster. This loads the pane with reports about the progress and health of the cluster.

To view metrics in the Amazon CloudWatch console
Sign in to the AWS Management Console and open the Amazon CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
In the navigation pane, click the All Metrics.
Scroll down to the metric to graph. You can search on the cluster identifier of the cluster to monitor.

Click a metric to display the graph.

To access metrics from the Amazon CloudWatch CLI
Call
mon-get-stats.
For more information, see the Amazon CloudWatch Developer Guide.
To access metrics from the Amazon CloudWatch API
Call
GetMetricStatistics.
For more information, see Amazon CloudWatch API Reference.
Amazon EMR pushes metrics to Amazon CloudWatch, which means you can use Amazon CloudWatch to set alarms on your Amazon EMR metrics. You can, for example, configure an alarm in Amazon CloudWatch to send you an email any time the HDFS utilization rises above 80%.
The following topics give you a high-level overview of how to set alarms using Amazon CloudWatch. For detailed instructions, see Using Amazon CloudWatch in the Amazon CloudWatch Developer Guide.
The following video walks you through the process of setting an alarm on an Amazon EMR metric using the Amazon CloudWatch console.
Set alarms using the Amazon CloudWatch console
Sign in to the AWS Management Console and open the Amazon CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
Click the Create Alarm button. This launches the Create Alarm Wizard.

Scroll through the Amazon EMR metrics to locate the metric you want to place an alarm on. An easy way to display just the Amazon EMR metrics in this dialog box is to search on the cluster identifier of your cluster. Select the metric to create an alarm on and click Continue.

Fill in the Name, Description, Threshold, and Time values for the metric, and click Continue.

Choose Alarm as the alarm state. If you want Amazon CloudWatch to send you an email when the alarm state is reached, choose either a pre-existing Amazon SNS email subscription list or Create New Email Topic. If you select Create New Email Topic, you can set the name and email addresses for a new email subscription list. This list is saved and appears in the drop-down box for future alarms. Click Continue.
Note
If you use Create New Email Topic to create a new Amazon SNS topic, the email addresses must be verified before they receive notifications. Emails are only sent when the alarm enters an alarm state. If this alarm state change happens before the email addresses are verified, they do not receive a notification.

At this point, the Create Alarm Wizard gives you a chance to review the alarm you’re about to create. If you need to make any changes, you can use the Edit links on the right. Click Create Alarm.

Note
For more information about how to set alarms using the Amazon CloudWatch console, see Create an Alarm that Sends Email in the Amazon CloudWatch Developer Guide.
To set an alarm using the Amazon CloudWatch API
Call
mon-put-metric-alarm.
For more information, see Amazon CloudWatch Developer Guide.
To set an alarm using the Amazon CloudWatch API
Call
PutMetricAlarm.
For more information, see Amazon CloudWatch API Reference
The following table lists all of the metrics that Amazon EMR reports in the Amazon EMR console and pushes to Amazon CloudWatch.
Amazon EMR sends data for several metrics to Amazon CloudWatch. All Amazon EMR clusters automatically send metrics in five-minute intervals. Metrics are archived for two weeks; after that period, the data is discarded.
Note
Amazon EMR pulls metrics from a cluster. If a cluster becomes unreachable, no metrics will be reported until the cluster becomes available again.
| Metric | Description |
|---|---|
|
|
The number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests. Data points for this metric are reported only when a corresponding instance group exists. Use Case: Monitor cluster health Units: Count |
|
|
The number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists. Use Case: Monitor cluster health Units: Count |
HBaseBackupFailed |
Whether the last backup failed. This is set to 0 by default and updated to 1 if the previous backup attempt failed. This metric is only reported for HBase clusters. Use Case: Monitor HBase backups Units: Count |
HBaseMostRecentBackupDuration |
The amount of time it took the previous backup to complete. This metric is set regardless of whether the last comppleted backup succeeded or failed. While the backup is ongoing, this metric returns the number of minutes since the backup started. This metric is only reported for HBase clusters. Use Case: Monitor HBase Backups Units: Minutes |
HBaseTimeSinceLastSuccessfulBackup |
The number of elapsed minutes since the last successful HBase backup started on your cluster. This metric is only reported for HBase clusters. Use Case: Monitor HBase backups Units: Minutes |
|
|
The number of bytes read from HDFS. Use Case: Analyze cluster performance, Monitor cluster progress Units: Count |
|
|
The number of bytes written to HDFS. Use Case: Analyze cluster performance, Monitor cluster progress Units: Count |
|
|
The percentage of HDFS storage currently used. Use Case: Analyze cluster performance Units: Percent |
|
|
Indicates that a cluster is no longer performing work, but is still alive and accruing charges. It is set to 1 if no tasks are running and no jobs are running, and set to 0 otherwise. This value is checked at five-minute intervals and a value of 1 indicates only that the cluster was idle when checked, not that it was idle for the entire five minutes. To avoid false positives, you should raise an alarm when this value has been 1 for more than one consecutive 5-minute check. For example, you might raise an alarm on this value if it has been 1 for thirty minutes or longer. Use Case: Monitor cluster performance Units: Count |
|
|
The number of jobs in the cluster that have failed. Use Case: Monitor cluster health Units: Count |
|
|
The number of jobs in the cluster that are currently running. Use Case: Monitor cluster health Units: Count |
|
|
The percentage of data nodes that are receiving work from Hadoop. Use Case: Monitor cluster health Units: Percent |
|
|
The percentage of task trackers that are functional. Use Case: Monitor cluster health Units: Percent |
|
|
The unused map task capacity. This is calculated as the maximum number of map tasks for a given cluster, less the total number of map tasks currently running in that cluster. Use Case: Analyze cluster performance Units: Count |
|
|
The number of blocks in which HDFS has no replicas. These might be corrupt blocks. Use Case: Monitor cluster health Units: Count |
|
|
Unused reduce task capacity. This is calculated as the maximum reduce task capacity for a given cluster, less the number of reduce tasks currently running in that cluster. Use Case: Analyze cluster performance Units: Count |
|
|
The number of remaining map tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. A remaining map task is one that is not in any of the following states: Running, Killed, or Completed. Use Case: Monitor cluster progress Units: Count |
|
|
The ratio of the total map tasks remaining to the total map slots available in the cluster. Use Case: Analyze cluster performance Units: Ratio |
|
|
The number of remaining reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. Use Case: Monitor cluster progress Units: Count |
|
|
The number of running map tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs will be generated. Use Case: Monitor cluster progress Units: Count |
|
|
The number of running reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. Use Case: Monitor cluster progress Units: Count |
|
|
The number of bytes read from Amazon S3. Use Case: Analyze cluster performance, Monitor cluster progress Units: Count |
|
|
The number of bytes written to Amazon S3. Use Case: Analyze cluster performance, Monitor cluster progress Units: Count |
|
|
The number of core nodes waiting to be assigned. All of the task nodes requested may not be immediately available; this metric reports the pending requests. Data points for this metric are reported only when a corresponding instance group exists. Use Case: Monitor cluster health Units: Count |
|
|
The number of task nodes working. Data points for this metric are reported only when a corresponding instance group exists. Use Case: Monitor cluster health Units: Count |
|
|
The total number of concurrent data transfers. Use Case: Monitor cluster health Units: Count |
Amazon EMR data can be filtered using any of the dimensions in the following table.
| Dimension | Description |
|---|---|
| JobFlowId |
The identifier for a cluster. You can find this value by clicking on the cluster in the Amazon EMR console.
It takes the form j-XXXXXXXXXXXXX.
|
| JobId | The identifier of a job within a cluster. You can use this to filter the metrics returned from a cluster down to those that apply to a single job within the cluster. JobId takes the form job_XXXXXXXXXXXX_XXXX. |