Managed Apache HBase on Amazon EMR (Amazon S3 Storage Mode) - Comparing the Use of Amazon DynamoDB and Apache HBase for NoSQL

Managed Apache HBase on Amazon EMR (Amazon S3 Storage Mode)

Amazon EMR enables you to use Amazon S3 as a data store for Apache HBase using the EMR File System and offers the following benefits:

  • Separation of compute from storage— You can size your Amazon EMR cluster for compute instead of data requirements, allowing you to avoid the need for the customary 3x replication in HDFS.

  • Transient clusters—You can scale compute nodes without impacting your underlying storage and terminate your cluster to save costs and quickly restore it.

  • Built-in availability and durability—You get the availability and durability of Amazon S3 storage by default.

  • Easy to provision read replicas—You can create and configure a read-replica cluster in another Amazon EC2 Availability Zone that provides read-only access to the same data as the primary cluster, ensuring uninterrupted access to your data even if the primary cluster becomes unavailable.