Amazon DynamoDB
Developer Guide (API Version 2012-08-10)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

Query and Scan Operations

In addition to using primary keys to access and manipulate items, Amazon DynamoDB also provides two APIs for searching the data: Query and Scan.

  • Query

    A Query operation finds items in a table using only primary key attribute values. You must provide a hash key attribute name and a distinct value to search for. You can optionally provide a range key attribute name and value, and use a comparison operator to refine the search results. By default, a Query returns all of the data attributes for items with the specified primary key(s); however, you can use the ProjectionExpression parameter so that the Query only returns some of the attributes, rather than all of them.

    Query supports a specific set of comparison operators for choosing key values. You must specify the hash key attribute name and value as an equality condition. You can optionally specify a second condition, referring to the range key attribute; this condition allows you to choose from several conditional operators. For information about the available comparison operators, go to Query in the Amazon DynamoDB API Reference and refer to the KeyConditions parameter.

    Tip

    If your table has one or more secondary indexes, you can Query those indexes in the same way that you query a table. For more information, see Improving Data Access with Secondary Indexes.

    A single Query request can retrieve a maximum of 1 MB of data; DynamoDB can optionally apply a filter to this data, narrowing the results before they are returned to the user. (For more information on filters, see Using a Filter With Query and Scan.)

    A Query operation always returns a result set, but if no matching items are found, the result set will be empty.

    For items with a given hash key, DynamoDB stores those items in sorted order by range key. In a Query, DynamoDB retrieves the items in sorted order, and then processes the items using KeyConditions and any filters that may be present. Only then are the Query results sent back to the client.

    Query results are always sorted by the range key. If the data type of the range key is Number, the results are returned in numeric order; otherwise, the results are returned in order of ASCII character code values. By default, the sort order is ascending. To reverse the order, set the ScanIndexForward parameter to false.

  • Scan

    A Scan operation examines every item in the table. By default, a Scan returns all of the data attributes for every item; however, you can use the ProjectionExpression parameter so that the Scan only returns some of the attributes, rather than all of them.

    A single Scan request can retrieve a maximum of 1 MB of data; DynamoDB can optionally apply a filter to this data, narrowing the results before they are returned to the user. (For more information on filters, see Using a Filter With Query and Scan.)

    A Scan operation always returns a result set, but if no matching items are found, the result set will be empty.

Using a Filter With Query and Scan

With a Query or a Scan operation, you can specify an optional filter to refine the results returned to you. A filter lets you apply conditional expressions to the data, after it is queried or scanned, but before it is returned to the user.

A single Query or Scan operation can retrieve a maximum of 1 MB of data. This limit applies before a filter is applied to the results.

You can use a filter to choose only the items whose attribute values don't meet your criteria. For example, in a discussion forum application, you could Query the Thread table for a particular ForumName (hash key) and Subject (range key). Of the matching items that are found, you could use the FilterExpression parameter so that only the most popular discussion threads are returned - for example, those threads with more than 1000 Views. As another example, you could Scan the Thread table and use a FilterExpression to return only those threads that have been Answered.

Note

The syntax for a FilterExpression is identical to that of a ConditionExpression. In addition, FilterExpression and ConditionExpression use the same comparators, functions, and logical operators. For more information, see Condition Expression Reference.

Capacity Units Consumed by Query and Scan

When you create a table, you specify your read and write capacity unit requirements. When you issue a Query or a Scan request on a table, you consume the capacity units that you allocated for that table.

With a Query operation, you can retrieve data from a secondary indexes in the same way you query a table. If you Query a local secondary index, then capacity units are consumed from the table's provisioned throughput. However, for queries against a global secondary index, capacity units are consumed from the provisioned throughput of the index. This is because a global secondary index has its own provisioned throughput settings, separate from those of its table.

For more information about how DynamoDB computes the capacity units consumed by your operation, see Capacity Units Calculations for Various Operations.

Note

For Query and Scan operations, DynamoDB calculates the amount of consumed provisioned throughput based on item size, not on the amount of data that is returned to an application.

For this reason, the number of capacity units consumed will be the same whether you request all of the attributes (the default behavior) or just some of them using the ProjectionExpression parameter.

The number of capacity units consumed will also be the same whether or not you specify a FilterExpression.

Paginating the Results

DynamoDB paginates the results from Query and Scan operations. With pagination, Query and Scan results are divided into distinct pieces; an application can process the first page of results, then the second page, and so on. The data returned from a Query or Scan operation is limited to 1 MB; this means that if you scan a table that has more than 1 MB of data, you'll need to perform another Scan operation to continue to the next 1 MB of data in the table.

If you query for specific attributes that match values that amount to more than 1 MB of data, you'll need to perform another Query request for the next 1 MB of data. To do this, take the LastEvaluatedKey value from the previous request, and use that value as the ExclusiveStartKey in the next request. This will let you progressively query or scan for new data in 1 MB increments.

When the entire result set from a Query or Scan has been processed, the LastEvaluatedKey is null. This indicates that the result set is complete (i.e. the operation processed the “last page” of data).

If LastEvaluatedKey is anything other than null, this does not necessarily mean that there is more data in the result set. The only way to know when you have reached the end of the result set is when LastEvaluatedKey is null.

Count and ScannedCount

The DynamoDB Query and Scan APIs use the Count parameter. Count is used for two distinct purposes:

  • In a request, set the Count parameter to true if you want DynamoDB to provide the total number of items that match the filter expression, instead of a list of the matching items.

  • In a response, DynamoDB returns a Count value for the number of matching items in a request. If the matching items for a filter expression or query condition is over 1 MB, Count contains a partial count of the total number of items that match the request. To get the full count of items that match, take the LastEvaluatedKey value from the previous request, and use that value as the ExclusiveStartKey in the next request. Repeat this until DynamoDB no longer returns a LastEvaluatedKey.

Query and Scan operations also return a ScannedCount value. The ScannedCount value is the total number of items that were queried or scanned, before any filter was applied to the results.

Limit

The DynamoDB Query and Scan APIs allow a Limit value to restrict the size of the results.

In a request, set the Limit parameter to the number of items that you want DynamoDB to process before returning results.

In a response, DynamoDB returns all the matching results within the scope of the Limit value. For example, if you issue a Query or a Scan request with a Limit value of 6 and without a filter, the operation returns the first six items in the table that match the request parameters. If you also supply a FilterExpression, the operation returns the items within the first six items in the table that match the filter requirements.

For either a Query or Scan operation, DynamoDB might return a LastEvaluatedKey value if the operation did not return all matching items in the table. To get the full count of items that match, take the LastEvaluatedKey from the previous request and use it as the ExclusiveStartKey in the next request. Repeat this until DynamoDB no longer returns a LastEvaluatedKey.

Read Consistency for Query and Scan

A Query result is an eventually consistent read, but you can request a strongly consistent read instead. An eventually consistent read might not reflect the results of a recently completed PutItem or UpdateItem operation. For more information, see Data Read and Consistency Considerations.

A Scan result is an eventually consistent read, meaning that changes to data immediately before the scan takes place might not be included in the scan results. The result set will not contain any duplicate items.

Query and Scan Performance

Generally, a Query operation is more efficient than a Scan operation.

A Scan operation always scans the entire table, then filters out values to provide the desired result, essentially adding the extra step of removing data from the result set. Avoid using a Scan operation on a large table with a filter that removes many results, if possible. Also, as a table grows, the Scan operation slows. The Scan operation examines every item for the requested values, and can use up the provisioned throughput for a large table in a single operation. For quicker response times, design your tables in a way that can use the Query, Get, or BatchGetItem APIs, instead. Alternatively, design your application to use Scan operations in a way that minimizes the impact on your table's request rate. For more information, see Guidelines for Query and Scan.

A Query operation searches for a specific range of keys that satisfy a given set of key conditions. If you specify a query filter, then DynamoDB must perform the extra step of removing data from the result set. A Query operation seeks the specified composite primary key, or range of keys, until one of the following events occur:

  • The result set is exhausted.

  • The number of items retrieved reaches the value of the Limit parameter, if specified.

  • The amount of data retrieved reaches the maximum result set size limit of 1 MB.

Query performance depends on the amount of data retrieved, rather than the overall number of primary keys in a table. The parameters for a Query operation (and consequently the number of matching keys) determine the performance of the query. For example, a query on one table that contains a large set of range key elements for a single hash key element can be more efficient than a query on a table that has fewer range key elements per hash key element, if the number of matching keys in the first table is fewer than in the second. The total number of primary keys, in either table, does not determine the efficiency of a Query operation. A query filter can also impact the efficiency of a Query, because the items that don't match the filter must be removed from the result set. Avoid using a Query operation on a large table with a filter that removes many results, if possible.

If a specific hash key element has a large range key element set, and the results cannot be retrieved in a single Query request, the ExclusiveStartKey continuation parameter allows you to submit a new query request from the last retrieved item without re-processing the data already retrieved.

Parallel Scan

By default, the Scan operation processes data sequentially. DynamoDB returns data to the application in 1 MB increments, and an application performs additional Scan operations to retrieve the next 1 MB of data.

The larger the table, the more time the Scan will take to complete. In addition, a sequential Scan might not always be able to fully utilize the table's provisioned read throughput capacity: Even though DynamoDB distributes a large table's data across multiple physical partitions, a Scan operation can only read one partition at a time. For this reason, the throughput of a Scan is constrained by the maximum throughput of a single partition.

To address these issues, the Scan operation can logically divide a table into multiple segments, with multiple application workers scanning the segments in parallel. Each worker can be a thread (in programming languages that support multithreading) or an operating system process. To perform a parallel scan, each worker issues its own Scan request with the following parameters:

  • Segment — A segment to be scanned by a particular worker. Each worker should use a different value for Segment.

  • TotalSegments — The total number of segments for the parallel scan. This value must be the same as the number of workers that your application will use.

The following diagram shows how a multithreaded application performs a parallel Scan with three degrees of parallelism:

In this diagram, the application spawns three threads and assigns each thread a number. (Segments are zero-based, so the first number is always 0.) Each thread issues a Scan request, setting Segment to its designated number and setting TotalSegments to 3. Each thread scans its designated segment, retrieving data 1 MB at a time, and returns the data to the application's main thread.

The values for Segment and TotalSegments apply to individual Scan requests, and you can use different values at any time. You might need to experiment with these values, and the number of workers you use, until your application achieves its best performance.

Note

A parallel scan with a large number of workers can easily consume all of a table's provisioned throughput; it is best to avoid such scans if the table is also incurring heavy read or write activity from other applications.

To control the amount of data returned per request, use the Limit parameter. This can help prevent situations where one worker consumes all of the provisioned throughput, at the expense of all other workers. For more information, see "Reduce Page Size" in Avoid Sudden Bursts of Read Activity.