Read Performance Considerations - Comparing the Use of Amazon DynamoDB and Apache HBase for NoSQL

Read Performance Considerations

With Amazon S3 storage mode enabled, Apache HBase region servers use MemStore to store data writes in-memory, and use write-ahead logs to store data writes in HDFS before the data is written to HBase StoreFiles in Amazon S3. Reading records directly from the StoreFile in Amazon S3 results in significantly higher latency and higher standard deviation than reading from HDFS.

Amazon S3 scales to support very high request rates. If your request rate grows steadily, Amazon S3 automatically partitions your buckets as needed to support higher request rates. However, the maximum request rates for Amazon S3 are lower than what can be achieved from the local cache. For more information about Amazon S3 performance, see Performance Optimization.

For read-heavy workloads caching data in-memory or on-disk caches in Amazon EC2 instance storage is recommended. Because Apache HBase region servers use BlockCache to store data reads in memory and BucketCache to store data reads on EC2 instance storage, you can choose an EC2 instance type with sufficient instance store.

In addition, you can add Amazon Elastic Block Store (Amazon EBS) storage to accommodate your required cache size. You can increase the BucketCache size on attached instance stores and EBS volumes using the hbase.bucketcache.size property.