Amazon DynamoDB
Developer Guide (API Version 2012-08-10)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

Global Secondary Indexes

Some applications might need to perform many kinds of queries, using a variety of different attributes as query criteria. To support these requirements, you can create one or more global secondary indexes and issue Query requests against these indexes. To illustrate, consider a table named GameScores that keeps track of users and scores for a mobile gaming application. Each item in GameScores is identified by a hash key (UserId) and a range key (GameTitle). The following diagram shows how the items in the table would be organized. (Not all of the attributes are shown)

Now suppose that you wanted to write a leaderboard application to display top scores for each game. A query that specified the key attributes (UserId and GameTitle) would be very efficient; however, if the application needed to retrieve data from GameScores based on GameTitle only, it would need to use a Scan operation. As more items are added to the table, scans of all the data would become slow and inefficient, making it difficult to answer questions such as these:

  • What is the top score ever recorded for the game Meteor Blasters?

  • Which user had the highest score for Galaxy Invaders?

  • What was the highest ratio of wins vs. losses?

To speed up queries on non-key attributes, you can create a global secondary index. A global secondary index contains a selection of attributes from the table, but they are organized by a primary key that is different from that of the table. The index key does not need to have any of the key attributes from the table; it doesn't even need to have the same key schema as a table.

For example, you could create a global secondary index named GameTitleIndex, with a hash key of GameTitle and a range key of TopScore. Since the table's primary key attributes are always projected into an index, the UserId attribute is also present. The following diagram shows what GameTitleIndex index would look like:

Now you can query GameTitleIndex and easily obtain the scores for Meteor Blasters. The results are ordered by the range key, TopScore. If you set the ScanIndexForward parameter to false, the results are returned in descending order, so the highest score is returned first.

Every global secondary index must have a hash key, and can have an optional range key. The index key schema can be different from the table schema; you could have a table with a hash type primary key, and create a global secondary index with a hash-and-range index key — or vice-versa. The index key attributes can consist of any attributes from the table, as long as the data types are scalar rather than multi-value sets.

You can project other table attributes into the index if you want. When you query the index, DynamoDB can retrieve these projected attributes efficiently; however, global secondary index queries cannot fetch attributes from the parent table. For example, if you queried GameTitleIndex, as shown in the diagram above, the query would not be able to access any attributes other than GameTitle and TopScore.

In a DynamoDB table, each key value must be unique. However, the key values in a global secondary index do not need to be unique. To illustrate, suppose that a game named Comet Quest is especially difficult, with many new users trying but failing to get a score above zero. Here is some data that we could use to represent this:

UserIdGameTitleTopScore
123Comet Quest0
201Comet Quest0
301Comet Quest0

When this data is added to the GameScores table, DynamoDB will propagate it to GameTitleIndex. If we then query the index using Comet Quest for GameTitle and 0 for TopScore, the following data is returned:

Only the items with the specified key values appear in the response; within that set of data, the items are in no particular order.

A global secondary index only keeps track of data items where its key attribute(s) actually exist. For example, suppose that you added another new item to the GameScores table, but only provided the required primary key attributes:

UserIdGameTitle
400Comet Quest

Because you didn't specify the TopScore attribute, DynamoDB would not propagate this item to GameTitleIndex. Thus, if you queried GameScores for all the Comet Quest items, you would get the following four items:

A similar query on GameTitleIndex would still return three items, rather than four. This is because the item with the nonexistent TopScore is not propagated to the index:

Attribute Projections

A projection is the set of attributes that is copied from a table into a secondary index. The hash and range keys of the table are always projected into the index; you can project other attributes to support your application's query requirements. When you query an index, Amazon DynamoDB can access any attribute in the projection as if those attributes were in a table of their own.

When you create a secondary index, you need to specify the attributes that will be projected into the index. The CreateTable action provides three options for doing this:

  • KEYS_ONLY – Each item in the index consists only of the table hash and range key values, plus the index key values. The KEYS_ONLY option results in the smallest possible secondary index.

  • INCLUDE – In addition to the attributes described in KEYS_ONLY, the secondary index will include other non-key attributes that you specify.

  • ALL – The secondary index includes all of the attributes from the source table. Because all of the table data is duplicated in the index, an ALL projection results in the largest possible secondary index.

In the diagram above, GameTitleIndex does not have any additional projected attributes. An application can use GameTitle and TopScore in queries; however, it is not possible to efficiently determine which user has the highest score for a particular game, or the highest ratio of wins vs. losses. The most efficient way to support queries on this data would be to project these attributes from the table into the global secondary index, as shown in this diagram:

Because the non-key attributes Wins and Losses are projected into the index, an application can determine the wins vs. losses ratio for any game, or for any combination of game and user ID.

When you choose the attributes to project into a global secondary index, you must consider the tradeoff between provisioned throughput costs and storage costs:

  • If you need to access just a few attributes with the lowest possible latency, consider projecting only those attributes into a global secondary index. The smaller the index, the less that it will cost to store it, and the less your write costs will be.

  • If your application will frequently access some non-key attributes, you should consider projecting those attributes into a global secondary index. The additional storage costs for the global secondary index will offset the cost of performing frequent table scans.

  • If you need to access most of the non-key attributes on a frequent basis, you can project these attributes—or even the entire source table— into a global secondary index. This will give you maximum flexibility; however, your storage cost would increase, or even double.

  • If your application needs to query a table infrequently, but must perform many writes or updates against the data in the table, consider projecting KEYS_ONLY. The global secondary index would be of minimal size, but would still be available when needed for query activity.

Creating a Global Secondary Index

To create one or more global secondary indexes on a table, use the GlobalSecondaryIndexes parameter of the CreateTable operation. Global secondary indexes on a table are created when the table is created. When you delete a table, any global secondary indexes on that table are also deleted.  For maximum query flexibility, you can create up to five global secondary indexes per table.

You must specify one attribute for the index hash key; you can optionally specify another attribute for the index range key. It is not necessary for either of these key attributes to be the same as a key attribute in the table. For example, neither TopScore nor TopScoreDateTime are key attributes in the GameScores table; you could create a global secondary index with a hash key of TopScore and a range key of TopScoreDateTime. You might use such an index to determine whether there is a correlation between high scores and the time of day a game is played.

Each index key attribute must be a scalar data type, not a multi-value set. You can project attributes of any data type into a global secondary index; this includes scalar data types and multi-valued sets. For a complete list of data types, see DynamoDB Data Types.

In the GlobalSecondaryIndexes parameter of CreateTable, you must provide ProvisionedThroughput settings for the index, consisting of ReadCapacityUnits and WriteCapacityUnits. These provisioned throughput settings are separate from those of the table, but behave in similar ways. For more information, see Provisioned Throughput Considerations for Global Secondary Indexes.

Querying a Global Secondary Index

You can access the data in a global secondary index only by using the >Query operation. The query must specify the name of the table and the name of the index that you want to use, the attributes to be returned in the query results, and any query conditions that you want to apply. DynamoDB can return the results in ascending or descending order.

Consider the following data returned from a Query that requests gaming data for a leaderboard application:

{
    TableName: "GameScores",
    IndexName: "GameTitleIndex",
    KeyConditions: {
        GameTitle: {
            ComparisonOperator: "EQ", 
            AttributeValueList: [ 
                "Meteor Blasters"
            ]
        }
    },
    ProjectionExpression: "UserId, TopScore",
    ScanIndexForward: false
}

In this query:

  • DynamoDB accesses GameTitleIndex, using the GameTitle hash key to locate the index items for Meteor Blasters. All of the index items with this key are stored adjacent to each other for rapid retrieval.

  • Within this game, DynamoDB uses the index to access all of the user IDs and top scores for this game.

  • The results are returned, sorted in descending order because the ScanIndexForward parameter is set to false.

Data Synchronization Between Tables and Global Secondary Indexes

DynamoDB automatically synchronizes each global secondary index with its parent table. When an application writes or deletes items in a table, any global secondary indexes on that table are updated asynchronously, using an eventually consistent model. Applications never write directly to an index. However, it is important that you understand the implications of how DynamoDB maintains these indexes.

When you put or delete items in a table, the global secondary indexes on that table are updated in an eventually consistent fashion. Changes to the table data are propagated to the global secondary indexes within a fraction of a second, under normal conditions. However, in some unlikely failure scenarios, longer propagation delays might occur. Because of this, your applications need to anticipate and handle situations where a query on a global secondary index returns results that are not up-to-date.

If you write an item to a table, you don't have to specify the attributes for any global secondary index range key. Using GameTitleIndex as an example, you would not need to specify a value for the TopScore attribute in order to write a new item to the GameScores table. In this case, Amazon DynamoDB does not write any data to the index for this particular item.

A table with many global secondary indexes will incur higher costs for write activity than tables with fewer indexes. For more information, see Provisioned Throughput Considerations for Global Secondary Indexes.

Provisioned Throughput Considerations for Global Secondary Indexes

When you create a global secondary index, you must specify read and write capacity units for the expected workload on that index. The provisioned throughput settings of a global secondary index are separate from those of its parent table. A Query operation on a global secondary index consumes read capacity units from the index, not the table. When you put, update or delete items in a table, the global secondary indexes on that table are also updated; these index updates consume write capacity units from the index, not from the table.

For example, if you Query a global secondary index and exceed its provisioned read capacity, your request will be throttled. If you perform heavy write activity on the table, but a global secondary index on that table has insufficient write capacity, then the write activity on the table will be throttled.

To view the provisioned throughput settings for a global secondary index, use the DescribeTable operation; detailed information about all of the table's global secondary indexes will be returned.

Read Capacity Units

Global secondary indexes support eventually consistent reads, each of which consume one half of a read capacity unit. This means that a single global secondary index query can retrieve up to 2 × 4 KB = 8 KB per read capacity unit.

For global secondary index queries, DynamoDB calculates the provisioned read activity in the same way as it does for queries against tables. The only difference is that the calculation is based on the sizes of the index entries, rather than the size of the item in the table. The number of read capacity units is the sum of all projected attribute sizes across all of the items returned; the result is then rounded up to the next 4 KB boundary. For more information on how DynamoDB calculates provisioned throughput usage, see Specifying Read and Write Requirements for Tables.

The maximum size of the results returned by a Query operation is 1 MB; this includes the sizes of all the attribute names and values across all of the items returned.

For example, consider a global secondary index where each item contains 2000 bytes of data. Now suppose that you Query this index and, that the query returns 8 items. The total size of the matching items is 2000 bytes × 8 items = 16,000 bytes; this is then rounded up to the nearest 4 KB boundary. Since global secondary index queries are eventually consistent, the total cost is 0.5 × (16 KB / 4 KB), or 2 read capacity units.

Write Capacity Units

When an item in a table is added, updated, or deleted, and a global secondary index is affected by this, then the global secondary index will consume provisioned write capacity units for the operation. The total provisioned throughput cost for a write consists of the sum of write capacity units consumed by writing to the table and those consumed by updating the global secondary indexes. Note that if a write to a table does not require a global secondary index update, then no write capacity is consumed from the index.

In order for a table write to succeed, the provisioned throughput settings for the table and all of its global secondary indexes must have enough write capacity to accommodate the write; otherwise, the write to the table will be throttled. Even if no data needs to be written to a particular global secondary index, the table write will be throttled if that index has insufficient write capacity.

The cost of writing an item to a global secondary index depends on several factors:

  • If you write a new item to the table that defines an indexed attribute, or you update an existing item to define a previously undefined indexed attribute, one write operation is required to put the item into the index.

  • If an update to the table changes the value of an indexed key attribute (from A to B), two writes are required, one to delete the previous item from the index and another write to put the new item into the index. 

  • If an item was present in the index, but a write to the table caused the indexed attribute to be deleted, one write is required to delete the old item projection from the index.

  • If an item is not present in the index before or after the item is updated, there is no additional write cost for the index.

  • If an update to the table only changes the value of projected attributes in the index key schema, but does not change the value of any indexed key attribute, then one write is required to update the values of the projected attributes into the index.

All of these factors assume that the size of each item in the index is less than or equal to the 1 KB item size for calculating write capacity units. Larger index entries will require additional write capacity units. You can minimize your write costs by considering which attributes your queries will need to return and projecting only those attributes into the index.

Storage Considerations for Global Secondary Indexes

When an application writes an item to a table, DynamoDB automatically copies the correct subset of attributes to any global secondary indexes in which those attributes should appear. Your AWS account is charged for storage of the item in the table and also for storage of attributes in any global secondary indexes on that table.

The amount of space used by an index item is the sum of the following:

  • The size in bytes of the table primary key (hash and range key attributes)

  • The size in bytes of the index key attribute

  • The size in bytes of the projected attributes (if any)

  • 100 bytes of overhead per index item

To estimate the storage requirements for a global secondary index, you can estimate the average size of an item in the index and then multiply by the number of items in the table that have the global secondary index key attributes.

If a table contains an item where a particular attribute is not defined, but that attribute is defined as an index hash key or range key, then DynamoDB does not write any data for that item to the index.