Apache HBase on Amazon EMR Architecture Overview - Comparing the Use of Amazon DynamoDB and Apache HBase for NoSQL

Apache HBase on Amazon EMR Architecture Overview

Amazon EMR defines the concept of instance groups, which are collections of Amazon EC2 instances. The Amazon EC2 virtual servers perform roles analogous to the master and slave nodes of Hadoop. For best performance, Apache HBase clusters should run on at least two Amazon EC2 instances. There are three types of instance groups in an Amazon EMR cluster.

  • Master—Contains one master node that manages the cluster. You can use the Secure Shell (SSH) protocol to access the master node if you want to view logs or administer the cluster yourself. The master node runs the Apache HBase master server and Apache ZooKeeper.

  • Core—Contains one or more core nodes that run HDFS and store data. The core nodes run the Apache HBase region servers.

  • Task—(Optional). Contains any number of task nodes.