Menu
Amazon DynamoDB
Developer Guide (API Version 2012-08-10)

Guidelines for Working with Items

When you are working with items in DynamoDB, you need to consider how to get the best performance, how to reduce provisioned throughput costs, and how to avoid throttling by staying within your read and write capacity units. If the items that you are handling exceed the maximum item size, as described in Limits in DynamoDB, you need to consider how you will deal with the situation. This section offers best practices for addressing these considerations.

Use One-to-Many Tables Instead Of Large Set Attributes

If your table has items that store a large set type attribute, such as number set or string set, consider removing the attribute and breaking the table into two tables. To form one-to-many relationships between these tables, use the primary keys.

The Forum, Thread, and Reply tables in the Creating Tables and Loading Sample Data section are good examples of these one-to-many relationships. For example, the Thread table has one item for each forum thread, and the Reply table stores one or more replies for each thread.

Instead of storing replies as items in a separate table, you could store both threads and replies in the same table. For each thread, you could store all replies in an attribute of string set type; however, keeping thread and reply data in separate tables is beneficial in several ways:

  • If you store replies as items in a table, you can store any number of replies, because a DynamoDB table can store any number of items.

    If you store replies as an attribute value in the Thread table, you would be constrained by the maximum item size, which would limit the number of replies that you could store. (See Limits in DynamoDB)

  • When you retrieve a Thread item, you pay less for provisioned throughput, because you are retrieving only the thread data and not all the replies for that thread.

  • Queries allow you to retrieve only a subset of items from a table. By storing replies in a separate Reply table, you can retrieve only a subset of replies, for example, those within a specific date range, by querying the Reply table.

    If you store replies as a set type attribute value, you would always have to retrieve all the replies, which would consume more provisioned throughput for data that you might not need.

  • When you add a new reply to a thread, you add only an item to the Reply table, which incurs the provisioned throughput cost of only that single Reply item. If replies are stored in the Thread table, you incur the full cost of writing the entire Thread item including all replies whenever you add a single user reply to a thread.

Use Multiple Tables to Support Varied Access Patterns

If you frequently access large items in a DynamoDB table and you do not always use all of an item's larger attributes, you can improve your efficiency and make your workload more uniform by storing your smaller, more frequently accessed attributes as separate items in a separate table.

For example, consider the ProductCatalog table described in the Creating Tables and Loading Sample Data section. Items in this table contain basic product information, such as product name and description. This information changes infrequently, but it is used every time an application displays a product for the user.

If your application also needed to keep track of rapidly changing product attributes, such as price and availability, you could store this information in a separate table called ProductAvailability. This approach would minimize the throughput cost of updates. To illustrate, suppose that a ProductCatalog item was 3 KB in size, and the price and availability attributes for that item was 300 bytes. In this case, an update of these rapidly changing attributes would cost three, the same cost as updating any other product attributes. Now suppose that price and availability information were stored in a ProductAvailability table instead. In this case, updating the information would cost only one write capacity unit.

Note

For an explanation of capacity units, see Provisioned Throughput.

If your application also needed to store product data that is displayed less frequently, you could store this information in a separate table called ExtendedProductCatalog. Such data might include product dimensions, a track listing for music albums, or other attributes that are not accessed as often as the basic product data. This way, the application would only consume throughput when displaying basic product information to the users, and would only consume additional throughput if the user requests the extended product details.

The following are example instances of the tables in the preceding discussion. Note that all the tables have an Id attribute as the primary key.

ProductCatalog

IdTitleDescription
21"Famous Book""From last year's best sellers list, ..."
302"Red Bicycle""Classic red bicycle..."
58"Music Album""A new popular album from this week..."

ProductAvailability

IdPriceQuantityOnHand
21"$5.00 USD"3750
302"$125.00 USD"8
58"$5.00 USD""infinity (digital item)"

ExtendedProductCatalog

IdAverageCustomerRatingTrackListing
215 
3023.5 
584{"Track1#3:59", "Track2#2:34", "Track3#5:21", ...}

Here are several advantages and considerations for splitting the attributes of an item into multiple items in different tables:

  • The throughput cost of reading or writing these attributes is reduced. The cost of updating a single attribute of an item is based on the full size of the item. If the items are smaller, you will incur less throughput when you access each one.

  • If you keep your frequently accessed items small, your I/O workload will be distributed more evenly. Retrieving a large item can consume a great deal of read capacity all at once, from the same partition of your table; this can make your workload uneven, which can cause throttling. For more information, see Avoid Sudden Bursts of Read Activity

  • For single-item read operations, such as GetItem, throughput calculations are rounded up to the next 4 KB boundary. If your items are smaller than 4 KB and you retrieve the items by primary key only, storing item attributes as separate items may not reduce throughput. Even so, the throughput cost of operations such as Query and Scan are calculated differently: the sizes of all returned items are totaled, and that final total is rounded up to the next 4 KB boundary. For these operations, you might still reduce your throughput cost by moving the large attributes into separate items. For more information, see Capacity Unit Calculations.

Compress Large Attribute Values

You can compress large values before storing them in DynamoDB. Doing so can reduce the cost of storing and retrieving such data. Compression algorithms, such as GZIP or LZO, produce a binary output. You can then store this output in a Binary attribute type.

For example, the Reply table in the Creating Tables and Loading Sample Data section stores messages written by forum users. These user replies might consist of very long strings of text, which makes them excellent candidates for compression.

For sample code that demonstrates how to compress long messages in DynamoDB, see:

Store Large Attribute Values in Amazon S3

DynamoDB currently limits the size of the items that you store in tables. For more information, see Limits in DynamoDB. Your application, however, might need to store more data in an item than the DynamoDB size limits permit. To work around this issue, you can store the large attributes as an object in Amazon Simple Storage Service (Amazon S3), and store the object identifier in your item. You can also use the object metadata support in Amazon S3 to store the primary key value of the corresponding item as Amazon S3 object metadata. This use of metadata can help with future maintenance of your Amazon S3 objects.

For example, consider the ProductCatalog table in the Creating Tables and Loading Sample Data section. Items in the ProductCatalog table store information about item price, description, authors for books, and dimensions for other products. If you wanted to store an image of each product, these images could be large. It might make sense to store the images in Amazon S3 instead of DynamoDB.

There are important considerations with this approach:

  • Since DynamoDB does not support transactions that cross Amazon S3 and DynamoDB, your application will have to deal with failures and with cleaning up orphaned Amazon S3 objects.

  • Amazon S3 limits length of object identifiers, so you must organize your data in a way that accommodates this and other Amazon S3 constraints. For more information, go to the Amazon Simple Storage Service Developer Guide.

Break Up Large Attributes Across Multiple Items

If you need to store more data in a single item than DynamoDB allows, you can store that data in multiple items as chunks of a larger "virtual item". To get the best results, store the chunks in a separate table that has a simple primary key (partition key), and use batch operations (BatchGetItem and BatchWriteItem) to read and write the chunks. This approach will help you to spread your workload evenly across table partitions.

For example, consider the Forum, Thread and Reply tables described in the Creating Tables and Loading Sample Data section. Items in the Reply table contain forum messages that were written by forum users. Due to the 400 KB item size limit in DynamoDB, the length of each reply is also limited. For large replies, instead of storing one item in the Reply table, break the reply message into chunks, and then write each chunk into its own separate item in a new ReplyChunks table with a simple primary key (partition key).

The primary key of each chunk would be a concatenation of the primary key of its "parent" reply item, a version number, and a sequence number. The sequence number determines the order of the chunks. The version number ensures that if a large reply is updated later, it will be updated atomically. In addition, chunks that were created before the update will not be mixed with chunks that were created after the update.

You would also need to update the "parent" reply item with the number of chunks, so that when you need to retrieve all the chunks for a reply, you will know how many chunks to look for.

As an illustration, here is how these items might appear in the Reply and ReplyChunks tables:

Reply

IdReplyDateTimeMessageChunkCountChunkVersion
"DynamoDB#Thread1""2012-03-15T20:42:54.023Z" 31
"DynamoDB#Thread2""2012-03-21T20:41:23.192Z""short message"  

ReplyChunks

IdMessage
"DynamoDB#Thread1#2012-03-15T20:42:54.023Z#1#1""first part of long message text..."
"DynamoDB#Thread1#2012-03-15T20:42:54.023Z#1#3""...third part of long message text"
"DynamoDB#Thread1#2012-03-15T20:42:54.023Z#1#2""...second part of long message text..."

Here are important considerations with this approach:

  • Because DynamoDB does not support cross-item transactions, your application will need to deal with failure scenarios when writing multiple items and with inconsistencies between items when reading multiple items.

  • If your application retrieves a large amount of data all at once, it can generate nonuniform workloads, which can cause unexpected throttling. This is especially true when retrieving items that share a partition key value.

Chunking large data items avoids this problem by using a separate table with a simple primary key (partition key), so that each large chunk is spread across the table.

A workable, but less optimal, solution would be to store each chunk in a table with a composite key, with the partition key being the primary key of the "parent" item. With this design choice, an application that retrieves all of the chunks of the same "parent" item would generate a nonuniform workload, with uneven I/O distribution across partitions.