Managed Apache HBase on Amazon EMR (HDFS Storage Mode) - Comparing the Use of Amazon DynamoDB and Apache HBase for NoSQL

Managed Apache HBase on Amazon EMR (HDFS Storage Mode)

Apache HBase on Amazon EMR is optimized to run on AWS and offers the following benefits:

  • Minimal administrative overhead—Amazon EMR handles provisioning of Amazon EC2 instances, security settings, Apache HBase configuration, log collection, health monitoring, and replacement of faulty instances. You still have the flexibility to access the underlying infrastructure and customize Apache HBase further, if desired.

  • Easy and flexible deployment options—You can deploy Apache HBase on Amazon EMR using the AWS Management Console or by using the AWS Command Line Interface (AWS CLI). Once launched, resizing an Apache HBase cluster is easily accomplished with a single API call. Activities such as modifying the Apache HBase configuration at launch time or installing third-party tools such as Ganglia for monitoring performance metrics are feasible with custom or predefined scripts.

  • Unlimited scale—With Apache HBase running on Amazon EMR, you can gain significant cloud benefits such as easy scaling, low cost, pay only for what you use, and ease of use as opposed to the self-managed deployment model on Amazon EC2.

  • Integration with other AWS services—Amazon EMR is designed to seamlessly integrate with other AWS services, such as Amazon S3, Amazon DynamoDB, Amazon EC2, and Amazon CloudWatch.

  • Built-in backup feature—A key benefit of Apache HBase running on Amazon EMR is the built-in mechanism available for backing up Apache HBase data durably in Amazon S3. Using this feature, you can schedule full or incremental backups, and roll back or even restore backups to existing or newly launched clusters anytime.