Self-Managed Apache HBase Deployment Model on Amazon EC2 - Comparing the Use of Amazon DynamoDB and Apache HBase for NoSQL

Self-Managed Apache HBase Deployment Model on Amazon EC2

The Apache HBase self-managed model offers the most flexibility in terms of cluster management, but also presents the following challenges:

  • Administrative overhead—You must deal with the administrative burden of provisioning and managing your Apache HBase clusters.

  • Capacity planning—As with any traditional infrastructure, capacity planning is difficult and often prone to significant costly error. For example, you could over-invest and end up paying for unused capacity or under-invest and risk performance or availability issues.

  • Memory management—Apache HBase is mainly memory-driven. Memory can become a limiting factor as the cluster grows. It is important to determine how much memory is needed to run diverse applications on your Apache HBase cluster to prevent nodes from swapping data too often to the disk. The number of Apache HBase nodes and memory requirements should be planned well in advance.

  • Compute, storage, and network planning—Other key considerations for effectively operating an Apache HBase cluster include compute, storage, and network. These infrastructure components often require dedicated Apache Hadoop/Apache HBase administrators with specialized skills.