Amazon EMR
Developer Guide

HBase Cluster Prerequisites

An Amazon EMR cluster should meet the following requirements in order to run HBase:

  • The AWS CLI (optional)—To interact with HBase using the command line, download and install the latest version of the AWS CLI. For more information, see Installing the AWS Command Line Interface in the AWS Command Line Interface User Guide.

  • At least two instances (optional)—The cluster's master node runs the HBase master server and Zookeeper, and slave nodes run the HBase region servers. For best performance, HBase clusters should run on at least two EC2 instances, but you can run HBase on a single node for evaluation purposes.

  • Long-running cluster—HBase only runs on long-running clusters. By default, the CLI and Amazon EMR console create long-running clusters.

  • An Amazon EC2 key pair set (recommended)—To use the Secure Shell (SSH) network protocol to connect with the master node and run HBase shell commands, you must use an Amazon EC2 key pair when you create the cluster.

  • The correct AMI and Hadoop versions—HBase clusters are currently supported only on Hadoop 20.205 or later.

  • Ganglia (optional)—To monitor HBase performance metrics, install Ganglia when you create the cluster.

  • An Amazon S3 bucket for logs (optional)—The logs for HBase are available on the master node. If you'd like these logs copied to Amazon S3, specify an S3 bucket to receive log files when you create the cluster.