Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Monitor HBase with Ganglia

The Ganglia open source project is a scalable, distributed system designed to monitor clusters and grids while minimizing the impact on their performance. When you enable Ganglia on your cluster, you can generate reports and view the performance of the cluster as a whole, as well as inspect the performance of individual node instances. For more information about the Ganglia open-source project, go to http://ganglia.info/. For more information about using Ganglia with Amazon EMR clusters, see Monitor Performance with Ganglia.

You can install Ganglia on an Amazon EMR cluster by calling two bootstrap actions. The first, install-ganglia, installs Ganglia. The second, configure-hbase-for-ganglia, configures HBase to publish metrics to Ganglia.

Note

You must specify these bootstrap actions when you launch the HBase cluster; Ganglia reporting cannot be added to an HBase cluster that is already running.

Once the HBase cluster has been launched with Ganglia reporting configured, you can use port forwarding to access the Ganglia graphs and reports.

Ganglia also stores log files on the server at /mnt/var/log/ganglia/rrds. If you configured your cluster to persist log files to an Amazon S3 bucket, the Ganglia log files will be persisted there as well.

To configure an HBase cluster for Ganglia

  • Launch the cluster and specify both the install-ganglia and configure-hbase-for-ganglia bootstrap actions. This is shown in the following example.

    Note

    You can prefix the Amazon S3 bucket path with the region where your HBase cluster was launched, for example s3://region.elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia. For a list of regions supported by Amazon EMR see Choose an AWS Region.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --hbase --name "My HBase Cluster" \
          --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia \
          --bootstrap-action s3://region.elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia
    • Windows users:

      ruby elastic-mapreduce --create --hbase --name "My HBase Cluster" --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia --bootstrap-action s3://region.elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia

To view HBase metrics in the Ganglia web interface

  1. Use SSH to tunnel into the master node and create a secure connection. For information on how to create an SSH tunnel to the master node, see Open an SSH Tunnel to the Master Node.

  2. Install a web browser with a proxy tool, such as the FoxyProxy plug-in for Firefox, to create a SOCKS proxy for domains of the type *ec2*.amazonaws.com*. For a tutorial on how to do this, see Configure FoxyProxy to View Websites Hosted on the Master Node.

  3. With the proxy set and the SSH connection open, you can view the Ganglia metrics by opening a browser window with http://master-public-dns-name/ganglia/, where master-public-dns-name is the public DNS address of the master server in the HBase cluster. For information on how to locate the public DNS name of a master node, see To locate the public DNS name of the master node using the Amazon EMR console.

To view Ganglia log files on the master node

  • If the cluster is still running, you can access the log files by using SSH to connect to the master node and navigating to the /mnt/var/log/ganglia/rrds directory. For information about how to use SSH to connect to the master node, see Connect to the Master Node Using SSH.

To view Ganglia log files on Amazon S3

  • If you configured the cluster to persist log files to Amazon S3 when you launched it, the Ganglia log files will be written there as well. Logs are written to Amazon S3 every five minutes, so there may be a slight delay for the latest log files to be available. For more information, see View HBase Log Files.