Write Performance Considerations - Comparing the Use of Amazon DynamoDB and Apache HBase for NoSQL

Write Performance Considerations

As discussed in the preceding section, the frequency of MemStore flushes and the number of StoreFiles present during minor and major compactions can contribute significantly to an increase in region server response times, and consequently impact write performance. Consider increasing the size of the MemStore flush and HRegion block multiplier, which increases the elapsed time between major compactions for optimal write performance,

Apache HBase compactions and region servers perform optimally when fewer StoreFiles need to be compacted. You may get better performance using larger file block sizes (but less than 5 GB) to trigger Amazon S3 multipart upload functionality in EMRFS.

In summary, whether you are running a managed NoSQL database, such as Amazon DynamoDB or Apache HBase on Amazon EMR, or managing your Apache HBase cluster yourself on Amazon EC2 or on-premises, you should take performance optimizations into consideration if you want to maximize performance at reduced costs.

The key difference between a hosted NoSQL solution and managing it yourself is that a managed solution, such as Amazon DynamoDB or Apache HBase on Amazon EMR, lets you offload the bulk of the administration overhead so that you can focus on optimizing your application.

If you are a developer who is getting started with NoSQL, Amazon DynamoDB or the hosted Apache HBase on the Amazon EMR solution are suitable options, depending on your use case. For developers with in-depth Apache Hadoop/Apache HBase knowledge who need full control of their Apache HBase clusters, the self-managed Apache HBase deployment model offers the most flexibility from a cluster management standpoint.