Monitor HBase with Ganglia
The Ganglia open-source project is a scalable, distributed system designed to monitor clusters and grids while minimizing the impact on their performance. When you enable Ganglia on your cluster, you can generate reports and view the performance of the cluster as a whole, as well as inspect the performance of individual node instances. For more information about the Ganglia open-source project, see http://ganglia.info/. For more information about using Ganglia with Amazon EMR clusters, see Ganglia.
After the cluster is launched with Ganglia configured, you can access the Ganglia graphs and reports using the graphical interface running on the master node.
Ganglia also stores log files on the server at
/var/log/ganglia/rrds. If you configured your cluster to persist log files
to an Amazon S3 bucket, the Ganglia log files are persisted there as well.
To configure a cluster for Ganglia and HBase using the AWS CLI
Create the cluster with HBase and Ganglia installed using the AWS CLI:Copy
aws emr create-cluster --name "
Test cluster" --release-label
emr-5.5.0\ --applications Name=
Ganglia--use-default-roles \ --ec2-attributes KeyName=
When you specify the instance count without using the
--instance-groupsparameter, a single master node is launched, and the remaining instances are launched as core nodes. All nodes use the instance type specified in the command.
If you have not previously created the default Amazon EMR service role and Amazon EC2 instance profile, type aws
emr create-default-rolesto create them before typing the
For more information, see Amazon EMR commands in the AWS CLI.
To view HBase metrics in the Ganglia web interface
Use SSH to tunnel into the master node and create a secure connection. For more information, see Option 2, Part 1: Set Up an SSH Tunnel to the Master Node Using Dynamic Port Forwarding in the Amazon EMR Management Guide.
Install a web browser with a proxy tool, such as the FoxyProxy plug-in for Firefox, to create a SOCKS proxy for AWS domains. For more information, see Option 2, Part 2: Configure Proxy Settings to View Websites Hosted on the Master Node in the Amazon EMR Management Guide.
With the proxy set and the SSH connection open, you can view the Ganglia metrics by opening a browser window with http://
master-public-dns-nameis the public DNS address of the master server in the HBase cluster.
To view Ganglia log files on the master node
If the cluster is still running, you can access the log files by using SSH to connect to the master node and navigating to the
/var/log/ganglia/rrdsdirectory. For more information, see Connect to the Master Node Using SSH in the Amazon EMR Management Guide.
To view Ganglia log files on Amazon S3
If you configured the cluster to persist log files to Amazon S3 when you launched it, the Ganglia log files are written there as well. Logs are written to Amazon S3 every five minutes, so there may be a slight delay before the latest log files are available. For more information, see View HBase Log Files.