An Amazon EMR cluster should meet the following requirements in order to run HBase.
The AWS CLI (Optional)—To interact with HBase using the command line, download and install the latest version of the AWS CLI. For more information, see Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide.
A version of the Amazon EMR CLI that supports HBase
(Optional)—CLI version 2012-06-12 and later. To find out what
version of the CLI you have, run
elastic-mapreduce --version at the
command line. For more information about the Amazon EMR CLI and how to install it,
see the Command Line Interface Reference for Amazon EMR.
If you do not have the latest version of the CLI installed, you can use the
Amazon EMR console to launch HBase clusters.
The Amazon EMR CLI is no longer under feature development. Customers are encouraged to use the Amazon EMR commands in the AWS CLI instead.
At least two instances (Optional)—The cluster's master node runs the HBase master server and Zookeeper, and slave nodes run the HBase region servers. For best performance, HBase clusters should run on at least two EC2 instances, but you can run HBase on a single node for evaluation purposes.
Long-running cluster—HBase only runs on long-running clusters. By default, the CLI and Amazon EMR console create long running clusters.
An Amazon EC2 key pair set (Recommended)—To use the Secure Shell (SSH) network protocol to connect with the master node and run HBase shell commands, you must use an Amazon EC2 key pair when you create the cluster.
The correct AMI and Hadoop versions—HBase clusters are currently supported only on Hadoop 20.205 or later.
Ganglia (Optional)—If you want to monitor HBase performance metrics, you can install Ganglia when you create the cluster.
An Amazon S3 bucket for logs (Optional)—The logs for HBase are available on the master node. If you'd like these logs copied to Amazon S3, specify an Amazon S3 bucket to receive log files when you create the cluster.