Throughput Model - Comparing the Use of Amazon DynamoDB and Apache HBase for NoSQL

Throughput Model

Amazon DynamoDB uses a provisioned throughput model to process data. With this model, you can specify your read and write capacity needs in terms of number of input operations per second that a table is expected to achieve. During table creation time, Amazon DynamoDB automatically partitions and reserves the appropriate amount of resources to meet your specified throughput requirements.

Automatic scaling for Amazon DynamoDB automates capacity management and eliminates the guesswork involved in provisioning adequate capacity when creating new tables and global secondary indexes. With automatic scaling enabled, you can specify percent target utilization and DynamoDB will scale the provisioned capacity for reads and writes within the bounds to meet the target utilization percent. For more information, see Managing Throughput Capacity Automatically with DynamoDB Auto Scaling.

To decide on the required read and write throughput values for a table without auto scaling feature enabled, consider the following factors:

  • Item size—The read and write capacity units that you specify are based on a predefined data item size, per read or per write operation. For more information about provisioned throughput data item size restrictions, see Provisioned Throughput in Amazon DynamoDB.

  • Expected read and write request rates—You must also determine the expected number of read and write operations your application will perform against the table, per second.

  • Consistency—Whether your application requires strongly consistent or eventually consistent reads is a factor in determining how many read capacity units you need to provision for your table. For more information about consistency and Amazon DynamoDB, see the Consistency Model section in this document.

  • Global secondary indexes—The provisioned throughput settings of a global secondary index are separate from those of its parent table. Therefore, you must also consider the expected workload on the global secondary index when specifying the read and write capacity at index creation time.

  • Local secondary indexes—Queries against indexes consume provisioned read throughput. For more information, see Provisioned Throughput Considerations for Local Secondary Indexes.

Although read and write requirements are specified at table creation time, Amazon DynamoDB lets you increase or decrease the provisioned throughput to accommodate load with no downtime.

With Apache HBase, the number of nodes in a cluster can be driven by the required throughput for reads and/or writes. The available throughput on a given node can vary depending on the data, specifically:

  • Key/value sizes

  • Data access patterns

  • Cache hit rates

  • Node and system configuration

You should plan for peak load if load will likely be the primary factor that increases node count within an Apache HBase cluster.