Amazon EMR
Amazon EMR Release Guide


The Ganglia open source project is a scalable, distributed system designed to monitor clusters and grids while minimizing the impact on their performance. When you enable Ganglia on your cluster, you can generate reports and view the performance of the cluster as a whole, as well as inspect the performance of individual node instances. Ganglia is also configured to ingest and visualize Hadoop and Spark metrics. For more information about the Ganglia open-source project, go to

When you view the Ganglia web UI in a browser, you see an overview of the cluster’s performance, with graphs detailing the load, memory usage, CPU utilization, and network traffic of the cluster. Below the cluster statistics are graphs for each individual server in the cluster.

Ganglia Release Information for This Release of Amazon EMR

Application Amazon EMR Release Label Components installed with this application

Ganglia 3.7.2


emrfs, emr-goodies, ganglia-monitor, ganglia-metadata-collector, ganglia-web, hadoop-client, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, webserver


    Add Ganglia to a Cluster

    To add Ganglia to a cluster using the console

    1. Open the Amazon EMR console at

    2. Choose Create cluster.

    3. In Software configuration, choose either All Applications, Core Hadoop, or Spark.

    4. Proceed with creating the cluster with configurations as appropriate.

    To add Ganglia to a cluster using the AWS CLI

    In the AWS CLI, you can add Ganglia to a cluster by using create-cluster subcommand with the --applications parameter. If you specify only Ganglia using the --applications parameter, Ganglia is the only application installed.

    • Type the following command to add Ganglia when you create a cluster and replace myKey with the name of your EC2 key pair.

      aws emr create-cluster --name "Spark cluster with Ganglia" --release-label emr-5.11.0 \ --applications Name=Spark Name=Ganglia --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --use-default-roles

      When you specify the instance count without using the --instance-groups parameter, a single master node is launched, and the remaining instances are launched as core nodes. All nodes use the instance type specified in the command.


      If you have not previously created the default EMR service role and EC2 instance profile, type aws emr create-default-roles to create them before typing the create-cluster subcommand.

      For more information about using Amazon EMR commands in the AWS CLI, see

    View Ganglia Metrics

    Ganglia provides a web-based user interface that you can use to view the metrics Ganglia collects. When you run Ganglia on Amazon EMR, the web interface runs on the master node and can be viewed using port forwarding, also known as creating an SSH tunnel. For more information about viewing web interfaces on Amazon EMR, see View Web Interfaces Hosted on Amazon EMR Clusters in the Amazon EMR Management Guide.

    To view the Ganglia web interface

    1. Use SSH to tunnel into the master node and create a secure connection. For information about how to create an SSH tunnel to the master node, see Option 2, Part 1: Set Up an SSH Tunnel to the Master Node Using Dynamic Port Forwarding in the Amazon EMR Management Guide.

    2. Install a web browser with a proxy tool, such as the FoxyProxy plug-in for Firefox, to create a SOCKS proxy for domains of the type *ec2**. For more information, see Option 2, Part 2: Configure Proxy Settings to View Websites Hosted on the Master Node in the Amazon EMR Management Guide.

    3. With the proxy set and the SSH connection open, you can view the Ganglia UI by opening a browser window with http://master-public-dns-name/ganglia/, where master-public-dns-name is the public DNS address of the master server in the EMR cluster.

                    Ganglia cluster report

    Hadoop and Spark Metrics in Ganglia

    Ganglia reports Hadoop metrics for each instance. The various types of metrics are prefixed by category: distributed file system (dfs.*), Java virtual machine (jvm.*), MapReduce (mapred.*), and remote procedure calls (rpc.*).

    Ganglia metrics for Spark generally have prefixes for YARN application ID and Spark DAGScheduler. So prefixes follow this form:

    • DAGScheduler.*

    • application_xxxxxxxxxx_xxxx.driver.*

    • application_xxxxxxxxxx_xxxx.executor.*