Menu
Amazon Elastic MapReduce
Developer Guide

Monitor HBase with Ganglia

The Ganglia open-source project is a scalable, distributed system designed to monitor clusters and grids while minimizing the impact on their performance. When you enable Ganglia on your cluster, you can generate reports and view the performance of the cluster as a whole, as well as inspect the performance of individual node instances. For more information about the Ganglia open-source project, see http://ganglia.info/. For more information about using Ganglia with Amazon EMR clusters, see Monitor Performance with Ganglia.

You configure Ganglia for HBase using the configure-hbase-for-ganglia bootstrap action. This bootstrap action configures HBase to publish metrics to Ganglia.

Note

You must configure HBase and Ganglia when you launch the cluster; Ganglia reporting cannot be added to a running cluster.

After the cluster is launched with Ganglia configured, you can access the Ganglia graphs and reports using the graphical interface running on the master node.

Ganglia also stores log files on the server at /mnt/var/log/ganglia/rrds. If you configured your cluster to persist log files to an Amazon S3 bucket, the Ganglia log files are persisted there as well.

To configure a cluster for Ganglia and HBase using the AWS CLI

  • To launch a cluster and specify the configure-hbase-for-ganglia bootstrap action, type the following command and replace myKey with the name of your Amazon EC2 key pair.

    Note

    You can prefix the Amazon S3 bucket path with the region where your HBase cluster was launched, for example, s3://region.elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia. For more information about regions supported by Amazon EMR, see Choose an AWS Region.

    • Linux, UNIX, and Mac OS X users:

      aws emr create-cluster --name "Test cluster" --ami-version 3.3 \
      --applications Name=Hue Name=Hive Name=Pig Name=HBase Name=Ganglia \
      --use-default-roles --ec2-attributes KeyName=myKey \
      --instance-type c1.xlarge --instance-count 3 --termination-protected \
      --bootstrap-action Path=s3://elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia
    • Windows users:

      aws emr create-cluster --name "Test cluster" --ami-version 3.3 --applications Name=Hue Name=Hive Name=Pig Name=HBase Name=Ganglia --use-default-roles --ec2-attributes KeyName=myKey --instance-type c1.xlarge --instance-count 3 --termination-protected --bootstrap-action Path=s3://elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia

To view HBase metrics in the Ganglia web interface

  1. Use SSH to tunnel into the master node and create a secure connection. For more information, see Option 2, Part 1: Set Up an SSH Tunnel to the Master Node Using Dynamic Port Forwarding.

  2. Install a web browser with a proxy tool, such as the FoxyProxy plug-in for Firefox, to create a SOCKS proxy for AWS domains. For more information, see Option 2, Part 2: Configure Proxy Settings to View Websites Hosted on the Master Node .

  3. With the proxy set and the SSH connection open, you can view the Ganglia metrics by opening a browser window with http://master-public-dns-name/ganglia/, where master-public-dns-name is the public DNS address of the master server in the HBase cluster.

To view Ganglia log files on the master node

  • If the cluster is still running, you can access the log files by using SSH to connect to the master node and navigating to the /mnt/var/log/ganglia/rrds directory. For more information, see Connect to the Master Node Using SSH .

To view Ganglia log files on Amazon S3

  • If you configured the cluster to persist log files to Amazon S3 when you launched it, the Ganglia log files are written there as well. Logs are written to Amazon S3 every five minutes, so there may be a slight delay before the latest log files are available. For more information, see View HBase Log Files.