Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

HBase Cluster Prerequisites

An Amazon EMR cluster should meet the following requirements in order to run HBase.

  • A version of the Amazon EMR command line interface (CLI) that supports HBase (Optional)—CLI version 2012-06-12 and later. To find out what version of the CLI you have, run elastic-mapreduce --version at the command line. For more information about the Amazon EMR CLI and how to install it, see the Command Line Interface Reference for Amazon EMR. If you do not have the latest version of the CLI installed, you can use the Amazon EMR console to launch HBase clusters.

  • At least two instances (Optional)—The cluster's master node runs the HBase master server and Zookeeper, and slave nodes run the HBase region servers. For best performance, HBase clusters should run on at least two EC2 instances, but you can run HBase on a single node for evaluation purposes.

  • Persistent cluster—HBase only runs on persistent clusters. The CLI and Amazon EMR console automatically create HBase clusters with the --alive flag set.

  • An Amazon EC2 key pair set (Recommended)—To use the Secure Shell (SSH) network protocol to connect with the master node and run HBase shell commands, you must set an Amazon EC2 key pair when you create the cluster.

  • The correct instance type—HBase is only supported on the following instance types: m1.large, m1.xlarge, c1.xlarge, m2.2xlarge, m2.4xlarge, cc1.4xlarge, cc2.8xlarge, hi1.4xlarge, or hs1.8xlarge.

    The cc2.8xlarge instance type is only supported in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) regions. The cc1.4xlarge and hs1.8xlarge instance types are only supported in the US East (Northern Virginia) region. The hi1.4xlarge instance type is only supported in the US East (Northern Virginia) and EU (Ireland) regions.

  • The correct AMI and Hadoop versions—HBase clusters are currently supported only on Hadoop 20.205 or later. The CLI and Amazon EMR console automatically set the correct AMI on HBase clusters.

  • Ganglia (Optional)—If you want to monitor HBase performance metrics, you can use a bootstrap action to install Ganglia when you create the cluster.

  • An Amazon S3 bucket for logs (Optional)—The logs for HBase are available on the master node. If you'd like these logs copied to Amazon S3, specify an Amazon S3 bucket to receive log files when you create the cluster.