Table Operations - Comparing the Use of Amazon DynamoDB and Apache HBase for NoSQL

Table Operations

Amazon DynamoDB and Apache HBase provide scan operations to support large-scale analytical processing. A scan operation is similar to cursors in RDBMS. By taking advantage of the underlying sequential, sorted storage layout, a scan operation can easily iterate over wide ranges of records or entire tables. Applying filters to a scan operation can effectively narrow the result set and optimize performance.

Amazon DynamoDB uses parallel scanning to improve performance of a scan operation. A parallel scan logically sub-divides an Amazon DynamoDB table into multiple segments, and then processes each segment in parallel. Rather than using the default scan operation in Apache HBase, you can implement a custom parallel scan by means of the API to read rows in parallel.

Both Amazon DynamoDB and Apache HBase provide a Query API for complex query processing in addition to the scan operation. The Query API in Amazon DynamoDB is accessible only in tables that define a composite primary key. In Apache HBase, bloom filters improve Get operations and the potential performance gain increases with the number of parallel reads.

In summary, Amazon DynamoDB and Apache HBase have similar data processing models in that they both support only atomic single-row transactions. Both databases also provide batch operations for bulk data processing across multiple rows and tables.

One key difference between the two databases is the flexible provisioned throughput model of Amazon DynamoDB. The ability to increase capacity when you need it and decrease it when you are done is useful for processing variable workloads with unpredictable peaks.

For workloads that need high update rates to perform data aggregations or maintain counters, Apache HBase is a good choice. This is because Apache HBase supports a multi-version concurrency control mechanism, which contributes to its strongly consistent reads and writes. Amazon DynamoDB gives you the flexibility to specify whether you want your read request to be eventually consistent or strongly consistent depending on your specific workload.