Amazon DynamoDB
Developer Guide (API Version 2012-08-10)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Local Secondary Indexes

Some applications only need to query data using the table's primary key; however, there may be situations where an alternate range key would be helpful. To give your application a choice of range keys, you can create one or more local secondary indexes on a table and issue Query requests against these indexes.

For example, consider the Thread table that is defined in Example Tables and Data. This table is useful for an application such as the AWS Discussion Forums. The following diagram shows how the items in the table would be organized. (Not all of the attributes are shown.)

DynamoDB stores all of the items with the same hash key contiguously. In this example, given a particular ForumName, a Query operation could immediately locate all of the threads for that forum. Within a group of items with the same hash key, the items are sorted by range key. If the range key (Subject) is also provided in the query, DynamoDB can narrow down the results that are returned—for example, returning all of the threads in the "S3" forum that have a Subject beginning with the letter "a".

Some requests might require more complex data access patterns. For example:

  • Which forum threads get the most views and replies?

  • Which thread in a particular forum has the largest number of messages?

  • How many threads were posted in a particular forum within a particular time period?

To answer these questions, the Query action would not be sufficient. Instead, you would have to Scan the entire table. For a table with millions of items, this would consume a large amount of provisioned read throughput and take a long time to complete.

However, you can specify one or more local secondary indexes on non-key attributes, such as Replies or LastPostDateTime.

A local secondary index maintains an alternate range key for a given hash key. A local secondary index also contains a copy of some or all of the attributes from the table; you specify which attributes are projected into the local secondary index when you create the table. The data in a local secondary index is organized by the same hash key as the table, but with a different range key. This lets you access data items efficiently across this different dimension. For greater query flexibility, you can create up to five local secondary indexes per table.

Suppose that an application needs to find all of the threads that have been posted within the last three months. Without a local secondary index, the application would have to Scan the entire Thread table and discard any posts that were not within the specified time frame. With a local secondary index, a Query operation could use LastPostDateTime as a range key and find the data quickly.

The following diagram shows a local secondary index named LastPostIndex. Note that the hash key is the same as that of the Thread table, but the range key is LastPostDateTime.

Every local secondary index must meet the following conditions:

  • The hash key is the same as that of the source table.

  • The range key consists of a single attribute.

  • The range key attribute of the source table is projected into the index, where it acts as a non-key attribute.

In this example, the hash key is ForumName and the range key of the local secondary index is LastPostDateTime. In addition, the range key value from the source table (in this example, Subject) is projected into the index, but it is not a part of the index key. If an application needs a list that is based on ForumName and LastPostDateTime, it can issue a Query request against LastPostIndex. The query results are sorted by LastPostDateTime, and can be returned in ascending or descending order. The query can also apply key conditions, such as returning only items that have a LastPostDateTime within a particular time span.

Every local secondary index automatically contains the hash and range attributes from its parent table; you can optionally project non-key attributes into the index. When you query the index, DynamoDB can retrieve these projected attributes efficiently. When you query a local secondary index, the query can also retrieve attributes that are not projected into the index. DynamoDB will automatically fetch these attributes from the table, but at a greater latency and with higher provisioned throughput costs.

For any local secondary index, you can store up to 10 GB of data per distinct hash key value. This figure includes all of the items in the table, plus all of the items in the indexes, that have the same hash key. For more information, see Item Collections.

Attribute Projections

With LastPostIndex, an application could use ForumName and LastPostDateTime as query criteria; however, to retrieve any additional attributes, DynamoDB would need to perform additional read operations against the Thread table. These extra reads are known as fetches, and they can increase the total amount of provisioned throughput required for a query.

Suppose that you wanted to populate a web page with a list of all the threads in "S3" and the number of replies for each thread, sorted by the last reply date/time beginning with the most recent reply. To populate this list, you would need the following attributes:

  • Subject

  • Replies

  • LastPostDateTime

The most efficient way to query this data, and to avoid fetch operations, would be to project the Replies attribute from the table into the local secondary index, as shown in this diagram:

A projection is the set of attributes that is copied from a table into a secondary index. The hash and range keys of the table are always projected into the index; you can project other attributes to support your application's query requirements. When you query an index, Amazon DynamoDB can access any attribute in the projection as if those attributes were in a table of their own.

When you create a secondary index, you need to specify the attributes that will be projected into the index. The CreateTable action provides three options for doing this:

  • KEYS_ONLY – Each item in the index consists only of the table hash and range key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.

  • INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.

  • ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.

In the previous diagram, the non-key attribute Replies is projected into LastPostIndex. An application can query LastPostIndex instead of the full Thread table to populate a web page with Subject, Replies and LastPostDateTime. If any other non-key attributes are requested, DynamoDB would need to fetch those attributes from the Thread table.

From an application's point of view, fetching additional attributes from the table is automatic and transparent, so there is no need to rewrite any application logic. However, note that such fetching can greatly reduce the performance advantage of using a local secondary index.

When you choose the attributes to project into a local secondary index, you must consider the tradeoff between provisioned throughput costs and storage costs:

  • If you need to access just a few attributes with the lowest possible latency, consider projecting only those attributes into a local secondary index. The smaller the index, the less that it will cost to store it, and the less your write costs will be. If there are attributes that you occasionally need to fetch, the cost for provisioned throughput may well outweigh the longer-term cost of storing those attributes.

  • If your application will frequently access some non-key attributes, you should consider projecting those attributes into a local secondary index. The additional storage costs for the local secondary index will offset the cost of performing frequent table scans.

  • If you need to access most of the non-key attributes on a frequent basis, you can project these attributes—or even the entire source table— into a local secondary index. This will give you maximum flexibility and lowest provisioned throughput consumption, because no fetching would be required; however, your storage cost would increase, or even double if you are projecting all attributes.

  • If your application needs to query a table infrequently, but must perform many writes or updates against the data in the table, consider projecting KEYS_ONLY. The local secondary index would be of minimal size, but would still be available when needed for query activity.

Creating a Local Secondary Index

To create one or more local secondary indexes on a table, use the LocalSecondaryIndexes parameter of the CreateTable operation. Local secondary indexes on a table are created when the table is created. When you delete a table, any local secondary indexes on that table are also deleted.

You must specify one non-key attribute for the range key of the local secondary index. The attribute that you choose must be a scalar data type, not a multi-value set. For a complete list of data types, see DynamoDB Data Types.

Important

For tables with local secondary indexes, there is a 10 GB size limit per hash key. A table with local secondary indexes can store any number of items, as long as the total size for any one hash key does not exceed 10 GB. For more information, see Item Collection Size Limit.

You can project attributes of any data type into a local secondary index. This includes scalar data types and multi-valued sets. For a complete list of data types, see DynamoDB Data Types.

For an example CreateTable request that includes a local secondary index, go to CreateTable in the Amazon DynamoDB API Reference.

Querying a Local Secondary Index

In a DynamoDB table, the combined hash key and range key value for each item must be unique. However, in a local secondary index, the range key value does not need to be unique for a given hash key value. If there are multiple items in the local secondary index that have the same range key value, a Query operation will return all of the items that have the same hash key value. In the response, the matching items are not returned in any particular order.

You can query a local secondary index using either eventually consistent or strongly consistent reads. To specify which type of consistency you want, use the ConsistentRead parameter of the Query operation. A strongly consistent read from a local secondary index will always return the latest updated values. If the query needs to fetch additional attributes from the table, then those attributes will be consistent with respect to the index.

Example

Consider the following JSON payload from a Query, which requests data from the discussion threads in a particular forum:

{
    "TableName": "Thread",
    "IndexName": "LastPostIndex",
    "KeyConditions": {
        "ForumName": {
            "ComparisonOperator": "EQ", 
            "AttributeValueList": [ 
                {"S": "EC2"}
            ]
        },
        "LastPostDateTime": {
            "ComparisonOperator": "BETWEEN", 
            "AttributeValueList": [ 
                {"S": "2012-08-31T00:00:00.000Z"}, 
                {"S": "2012-11-31T00:00:00.000Z"}
            ]
        }  
    },
    "AttributesToGet": ["Subject", "LastPostDateTime", "Replies", "Tags"],
    "ConsistentRead": false
}

In this query:

  • DynamoDB accesses LastPostIndex, using the ForumName hash key to locate the index items for "EC2". All of the index items with this key are stored adjacent to each other for rapid retrieval.

  • Within this forum, DynamoDB uses the index to look up the keys that match the specified LastPostDateTime condition.

  • Because the Replies attribute is projected into the index, DynamoDB can retrieve this attribute without consuming any additional provisioned throughput.

  • The Tags attribute is not projected into the index, so DynamoDB must access the Thread table and fetch this attribute.

  • The results are returned, sorted by LastPostDateTime. The index entries are sorted by hash key and then by range key, and Query returns them in the order they are stored. (You can use the ScanIndexForward parameter to return the results in descending order.)

Because the Tags attribute is not projected into the local secondary index, DynamoDB must consume additional read capacity units to fetch this attribute from the table. If you need to run this query often, you should project Tags into LastPostIndex to avoid fetching from the table; however, if you needed to access Tags only occasionally, then the additional storage cost for projecting Tags into the index might not be worthwhile.


Item Writes and Local Secondary Indexes

DynamoDB automatically keeps all local secondary indexes synchronized with their respective tables. Applications never write directly to an index. However, it is important that you understand the implications of how DynamoDB maintains these indexes.

When you create a local secondary index, you specify an attribute to serve as the range key for the index. You also specify a data type for that attribute. This means that whenever you write an item to the table, if the item defines an index key attribute, its type must match the index key schema's data type. In the case of LastPostIndex, the LastPostDateTime range key in the index is defined as a String data type. If you attempt to add an item to the Thread table and specify a different data type for LastPostDateTime (such as Number), DynamoDB will return a ValidationException because of the data type mismatch.

If you write an item to a table, you don't have to specify the attributes for any local secondary index range key. Using LastPostIndex as an example, you would not need to specify a value for the LastPostDateTime attribute in order to write a new item to the Thread table. In this case, DynamoDB does not write any data to the index for this particular item.

There is no requirement for a one-to-one relationship between the items in a table and the items in a local secondary index; in fact, this behavior can be advantageous for many applications. For more information, see Take Advantage of Sparse Indexes.

A table with many local secondary indexes will incur higher costs for write activity than tables with fewer indexes. For more information, see Provisioned Throughput Considerations for Local Secondary Indexes.

Important

For tables with local secondary indexes, there is a 10 GB size limit per hash key. A table with local secondary indexes can store any number of items, as long as the total size for any one hash key does not exceed 10 GB. For more information, see Item Collection Size Limit.

Provisioned Throughput Considerations for Local Secondary Indexes

When you create a table in DynamoDB, you provision read and write capacity units for the table's expected workload. That workload includes read and write activity on the table's local secondary indexes.

To view the current rates for provisioned throughput capacity, go to http://aws.amazon.com/dynamodb/pricing.

Read Capacity Units

When you query a local secondary index, the number of read capacity units consumed depends on how the data is accessed.

As with table queries, an index query can use either eventually consistent or strongly consistent reads depending on the value of ConsistentRead. One strongly consistent read consumes one read capacity unit; an eventually consistent read consumes only half of that. Thus, by choosing eventually consistent reads, you can reduce your read capacity unit charges.

For index queries that request only index keys and projected attributes, DynamoDB calculates the provisioned read activity in the same way as it does for queries against tables. The only difference is that the calculation is based on the sizes of the index entries, rather than the size of the item in the table. The number of read capacity units is the sum of all projected attribute sizes across all of the items returned; the result is then rounded up to the next 4 KB boundary. For more information on how DynamoDB calculates provisioned throughput usage, see Specifying Read and Write Requirements for Tables.

For index queries that read attributes that are not projected into the local secondary index, DynamoDB will need to fetch those attributes from the table, in addition to reading the projected attributes from the index. These fetches occur when you include any non-projected attributes in the Select or AttributesToGet parameters of the Query operation. Fetching causes additional latency in query responses, and it also incurs a higher provisioned throughput cost: In addition to the reads from the local secondary index described above, you are charged for read capacity units for every table item fetched. This charge is for reading each entire item from the table, not just the requested attributes.

The maximum size of the results returned by a Query operation is 1 MB; this includes the sizes of all the attribute names and values across all of the items returned. However, if a Query against a local secondary index causes DynamoDB to fetch item attributes from the table, the maximum size of the data in the results might be lower. In this case, the result size is the sum of:

  • The size of the matching items in the index, rounded up to the next 4 KB.

  • The size of each matching item in the table, with each item individually rounded up to the next 4 KB.

Using this formula, the maximum size of the results returned by a Query operation is still 1 MB.

For example, consider a table where the size of each item is 300 bytes. There is a local secondary index on that table, but only 200 bytes of each item is projected into the index. Now suppose that you Query this index, that the query requires table fetches for each item, and that the query returns 4 items. DynamoDB sums up the following:

  • The size of the matching items in the index: 200 bytes × 4 items = 800 bytes; this is then rounded up to 4 KB.

  • The size of each matching item in the table: (300 bytes, rounded up to 4 KB) × 4 items = 16 KB.

The total size of the data in the result is therefore 20 KB.

Write Capacity Units

When an item in a table is added, updated, or deleted, updating the local secondary indexes will consume provisioned write capacity units for the table. The total provisioned throughput cost for a write is the sum of write capacity units consumed by writing to the table and those consumed by updating the local secondary indexes.

The cost of writing an item to a local secondary index depends on several factors:

  • If you write a new item to the table that defines an indexed attribute, or you update an existing item to define a previously undefined indexed attribute, one write operation is required to put the item into the index.

  • If an update to the table changes the value of an indexed key attribute (from A to B), two writes are required, one to delete the previous item from the index and another write to put the new item into the index. 

  • If an item was present in the index, but a write to the table caused the indexed attribute to be deleted, one write is required to delete the old item projection from the index.

  • If an item is not present in the index before or after the item is updated, there is no additional write cost for the index.

All of these factors assume that the size of each item in the index is less than or equal to the 1 KB item size for calculating write capacity units. Larger index entries will require additional write capacity units. You can minimize your write costs by considering which attributes your queries will need to return and projecting only those attributes into the index.

Storage Considerations for Local Secondary Indexes

When an application writes an item to a table, DynamoDB automatically copies the correct subset of attributes to any local secondary indexes in which those attributes should appear. Your AWS account is charged for storage of the item in the table and also for storage of attributes in any local secondary indexes on that table.

The amount of space used by an index item is the sum of the following:

  • The size in bytes of the table primary key (hash and range key attributes)

  • The size in bytes of the index key attribute

  • The size in bytes of the projected attributes (if any)

  • 100 bytes of overhead per index item

To estimate the storage requirements for a local secondary index, you can estimate the average size of an item in the index and then multiply by the number of items in the table.

If a table contains an item where a particular attribute is not defined, but that attribute is defined as an index range key, then DynamoDB does not write any data for that item to the index. For more information about this behavior, see Take Advantage of Sparse Indexes.

Item Collections

Note

The following section pertains only to tables that have local secondary indexes.

In DynamoDB, an item collection is any group of items that have the same hash key, in a table and all of its local secondary indexes. In the examples used throughout this section, the hash key for the Thread table is ForumName, and the hash key for LastPostIndex is also ForumName. All the table and index items with the same ForumName are part of the same item collection. For example, in the Thread table and the LastPostIndex local secondary index, there is an item collection for forum EC2 and a different item collection for forum RDS.

The following diagram shows the item collection for forum S3:

In this diagram, the item collection consists of all the items in Thread and LastPostIndex where the ForumName hash key is "S3". If there were other local secondary indexes on the table, then any items in those indexes with ForumName equal to "S3" would also be part of the item collection.

You can use any of the following operations in DynamoDB to return information about item collections:

  • BatchWriteItem

  • DeleteItem

  • PutItem

  • UpdateItem

Each of these operations support the ReturnItemCollectionMetrics parameter. When you set this parameter to SIZE, you can view information about the size of each item collection in the index.

Example

Here is a JSON snippet from the output of an UpdateItem operation on the Thread table, with ReturnItemCollectionMetrics set to SIZE. The item that was updated had a ForumName value of "EC2", so the output includes information about that item collection.

{
    "ItemCollectionMetrics": {
        "ItemCollectionKey": {
            "ForumName": {"S": "EC2"}
        },
        "SizeEstimateRangeGB": [0.0,1.0]
    }
}

The SizeEstimateRangeGB object shows that the size of this item collection is between 0 and 1 gigabyte. DynamoDB periodically updates this size estimate, so the numbers might be different next time the item is modified.


Item Collection Size Limit

The maximum size of any item collection is 10 GB. This limit does not apply to tables without local secondary indexes; only tables that have one or more local secondary indexes are affected.

If an item collection exceeds the 10 GB limit, DynamoDB will return an ItemCollectionSizeLimitExceededException and you won't be able to add more items to the item collection or increase the sizes of items that are in the item collection. (Read and write operations that shrink the size of the item collection are still allowed.) You will still be able to add items to other item collections.

To reduce the size of an item collection, you can do one of the following:

  • Delete any unnecessary items with the hash key in question. When you delete these items from the table, DynamoDB will also remove any index entries that have the same hash key.

  • Update the items by removing attributes or by reducing the size of the attributes. If these attributes are projected into any local secondary indexes, DynamoDB will also reduce the size of the corresponding index entries.

  • Create a new table with the same hash and range key, and then move items from the old table to the new table. This might be a good approach if a table has historical data that is infrequently accessed. You might also consider archiving this historical data to Amazon Simple Storage Service (Amazon S3).

When the total size of the item collection drops below 10 GB, you will once again be able to add items with the same hash key.

We recommend as a best practice that you instrument your application to monitor the sizes of your item collections. One way to do so is to set the ReturnItemCollectionMetrics parameter to SIZE whenever you use BatchWriteItem, DeleteItem, PutItem or UpdateItem. Your application should examine the ReturnItemCollectionMetrics object in the output and log an error message whenever an item collection exceeds a user-defined limit (8 GB, for example). Setting a limit that is less than 10 GB would provide an early warning system so you know that an item collection is approaching the limit in time to do something about it.

Item Collections and Partitions

The table and index data for each item collection is stored in a single partition. Referring to the Thread table example, all of the table and index items with the same ForumName attribute would be stored in the same partition. The "S3" item collection would be stored on one partition, "EC2" in another partition, and "RDS" in a third partition.

You should design your applications so that table data is evenly distributed across distinct hash key values. For tables with local secondary indexes, your applications should not create "hot spots" of read and write activity within a single item collection on a single partition.