Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Monitor HBase with Ganglia

The Ganglia open source project is a scalable, distributed system designed to monitor clusters and grids while minimizing the impact on their performance. When you enable Ganglia on your cluster, you can generate reports and view the performance of the cluster as a whole, as well as inspect the performance of individual node instances. For more information about the Ganglia open-source project, go to http://ganglia.info/. For more information about using Ganglia with Amazon EMR clusters, see Monitor Performance with Ganglia.

You configure Ganglia for HBase using the configure-hbase-for-ganglia bootstrap action. This bootstrap action configures HBase to publish metrics to Ganglia.

Note

You must configure HBase and Ganglia when you launch the cluster; Ganglia reporting cannot be added to a running cluster.

Once the cluster is launched with Ganglia reporting configured, you can access the Ganglia graphs and reports using the graphical interface running on the master node.

Ganglia also stores log files on the server at /mnt/var/log/ganglia/rrds. If you configured your cluster to persist log files to an Amazon S3 bucket, the Ganglia log files will be persisted there as well.

To configure a cluster for Ganglia and HBase using the AWS CLI

  • Launch the cluster and specify the configure-hbase-for-ganglia bootstrap action. This is shown in the following example.

    Note

    You can prefix the Amazon S3 bucket path with the region where your HBase cluster was launched, for example s3://region.elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia. For a list of regions supported by Amazon EMR see Choose an AWS Region.

    Type the following command to launch a cluster with Ganglia and HBase installed and to configure HBase for Ganglia using a bootstrap action:

    aws emr create-cluster --ami-version 3.1.1 --applications Name=HBase,Name=Ganglia --name "My HBase Cluster" --instance-count 5 --instance-type c1.xlarge \
    --bootstrap-action Path=s3://region.elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia

    Note

    When you specify the instance count without using the --instance-groups parameter, a single master node is launched, and the remaining instances are launched as core nodes. All nodes will use the instance type specified in the command.

    For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.

To configure an HBase cluster for Ganglia using the Amazon EMR CLI

Note

The Amazon EMR CLI is no longer under feature development. Customers are encouraged to use the Amazon EMR commands in the AWS CLI instead.

  • Launch the cluster and specify both the install-ganglia and configure-hbase-for-ganglia bootstrap actions. This is shown in the following example.

    Note

    You can prefix the Amazon S3 bucket path with the region where your HBase cluster was launched, for example s3://region.elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia. For a list of regions supported by Amazon EMR see Choose an AWS Region.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --hbase --name "My HBase Cluster" \
          --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia \
          --bootstrap-action s3://region.elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia
    • Windows users:

      ruby elastic-mapreduce --create --hbase --name "My HBase Cluster" --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia --bootstrap-action s3://region.elasticmapreduce/bootstrap-actions/configure-hbase-for-ganglia

To view HBase metrics in the Ganglia web interface

  1. Use SSH to tunnel into the master node and create a secure connection. For information on how to create an SSH tunnel to the master node, see Option 2, Part 1: Set Up an SSH Tunnel to the Master Node Using Dynamic Port Forwarding.

  2. Install a web browser with a proxy tool, such as the FoxyProxy plug-in for Firefox, to create a SOCKS proxy for AWS domains. For a tutorial on how to do this, see Option 2, Part 2: Configure Proxy Settings to View Websites Hosted on the Master Node.

  3. With the proxy set and the SSH connection open, you can view the Ganglia metrics by opening a browser window with http://master-public-dns-name/ganglia/, where master-public-dns-name is the public DNS address of the master server in the HBase cluster. For information on how to locate the public DNS name of a master node, see To retrieve the public DNS name of the master node using the Amazon EMR console.

To view Ganglia log files on the master node

  • If the cluster is still running, you can access the log files by using SSH to connect to the master node and navigating to the /mnt/var/log/ganglia/rrds directory. For information about how to use SSH to connect to the master node, see Connect to the Master Node Using SSH.

To view Ganglia log files on Amazon S3

  • If you configured the cluster to persist log files to Amazon S3 when you launched it, the Ganglia log files will be written there as well. Logs are written to Amazon S3 every five minutes, so there may be a slight delay for the latest log files to be available. For more information, see View HBase Log Files.