Apache HBase on Amazon EMR Architecture Overview
Amazon EMR defines the concept of instance groups, which are collections of Amazon EC2 instances. The Amazon EC2 virtual servers perform roles analogous to the NameNode and secondary nodes of Hadoop. For best performance, Apache HBase clusters should run on at least two Amazon EC2 instances. There are three types of instance groups in an Amazon EMR cluster.
-
NameNode—Contains the main and heartbeat node that manages the cluster. You can use the Secure Shell (SSH) protocol to access this node if you want to view logs or administer the cluster yourself. The NameNode runs the Apache HBase main server and Apache ZooKeeper.
-
Core—Contains one or more core nodes that run HDFS and store data. The core nodes run the Apache HBase region servers.
-
Task—(Optional). Contains any number of task nodes.