In addition to using primary keys to access and manipulate items, Amazon DynamoDB also
provides two APIs for searching the data:
Query operation finds items in a table using only primary
key attribute values. You must provide a hash key attribute name and a distinct
value to search for. You can optionally provide a range key attribute name and
value, and use a comparison operator to refine the search results. By default, a
Query returns all of the data attributes for items with the
specified primary key(s); however, you can use the
ProjectionExpression parameter so that the
Query only returns some of the attributes, rather than all
Query supports a specific set of comparison operators for
choosing key values. You must specify the hash key attribute name and value as an
equality condition. You can optionally specify a second condition, referring to the
range key attribute; this condition allows you to choose from several conditional
operators. For information about the available comparison operators, go to
Query in the Amazon DynamoDB API Reference and refer to the
If your table has one or more secondary indexes, you can
Query those indexes in the same way that you query a table.
For more information, see Improving Data Access with Secondary Indexes in DynamoDB.
Query request can retrieve a maximum of
1 MB of data; DynamoDB can optionally apply a filter expression to this data,
narrowing the results before they are returned to the user. (For more information on
filters, see Narrowing the Results with Filter Expressions.)
Query operation always returns a result set, but if no
matching items are found, the result set will be empty.
For items with a given hash key, DynamoDB stores those items in sorted order by
range key. In a
Query, DynamoDB retrieves the items in sorted
order, and then processes the items using
any filter expressions that may be present. Only then are the
results sent back to the client.
Query results are always sorted by the range key. If
the data type of the range key is Number, the results are returned in numeric order;
otherwise, the results are returned in order of ASCII character code values. By
default, the sort order is ascending. To reverse the order, set the
ScanIndexForward parameter to
Scan operation examines every item in a table or a secondary index. By
Scan returns all of the data attributes for every
item in the table or index. You can use the
ProjectionExpression parameter so that the
Scan only returns some of the attributes, rather than all
Scan request can retrieve a maximum of
1 MB of data; DynamoDB can optionally apply a filter expression to this
data, narrowing the results before they are returned to the user. (For more
information on filters, see Narrowing the Results with Filter Expressions.)
Scan operation always returns a result set, but if no
matching items are found, the result set will be empty.
Query or a
Scan operation, you can
specify an optional filter expression to refine the results returned to you. A
filter expression lets you apply conditions to the data, after
it is queried or scanned, but before it is returned to you. Only the items that meet
your conditions are returned to you.
Following are some examples of filter expressions. Note that these expressions
use placeholders (such as
:name) instead of actual values. For more information,
see Expression Attribute Names and Expression Attribute Values.
Query the Thread table for a particular ForumName (hash key) and Subject (range key). Of the items that are found, return only the most popular discussion threads — for example, those threads with more than a certain number of Views.
#V > :num
Note that Views is a reserved word in DynamoDB (see Reserved Words in DynamoDB), so we use an expression attribute name as a substitution.
Scan the Thread tale and return only the items that were last posted to by a particular user.
LastPostedBy = :name
The syntax for a
FilterExpression is identical to that of a
ConditionExpression. In addition,
the same comparators, functions, and logical operators as
ConditionExpression. For more information, see Condition Expression Reference.
Scan operation can retrieve a maximum of 1 MB of data. This limit applies before any filter expression is applied to the results.
When you create a table, you specify your read and write capacity unit requirements. If you add a global secondary index to the table, you must also provide the throughput requirements for that index.
You can use
Scan operations on
secondary indexes in the same way that you use these operations on a table. If you
Scan a local secondary index, then capacity units
are consumed from the table's provisioned throughput. However, if you perform these
operations on a global secondary index, capacity units are consumed from the provisioned throughput of
the index. This is because a global secondary index has its own provisioned throughput settings, separate
from those of its table.
For more information about how DynamoDB computes the capacity units consumed by your operation, see Capacity Units Calculations for Various Operations.
Scan operations, DynamoDB
calculates the amount of consumed provisioned throughput based on item size, not on
the amount of data that is returned to an application. For this reason, the number
of capacity units consumed will be the same whether you request all of the
attributes (the default behavior) or just some of them using the
The number of capacity units consumed will also be the same whether or not you
DynamoDB paginates the results from
Scan operations. With pagination,
Scan results are divided into distinct pieces; an application
can process the first page of results, then the second page, and so on. The data
returned from a
Scan operation is
limited to 1 MB; this means that if you scan a table that has more than 1 MB of data,
you'll need to perform another
Scan operation to continue to the
next 1 MB of data in the table.
If you query or scan for specific attributes that match values that amount to more than
1 MB of data, you'll need to perform another
request for the next 1 MB of data. To do this, take the
LastEvaluatedKey value from the previous request, and use
that value as the
ExclusiveStartKey in the next request. This
will let you progressively query or scan for new data in 1 MB
When the entire result set from a
Scan has been processed, the
indicates that the result set is complete (i.e. the operation processed the “last page”
LastEvaluatedKey is anything other than
null, this does not necessarily mean that
there is more data in the result set. The only way to know when you have reached the end
of the result set is when
Scan APIs use the
Count is used for two
In a request, set the
Count parameter to
true if you want DynamoDB to provide the total number of items
that match the filter expression, instead of a list of the matching
In a response, DynamoDB returns a
Count value for the
number of matching items in a request. If the matching items for a filter expression
or query condition is over 1 MB,
contains a partial count of the total number of items that match the request. To
get the full count of items that match, take the
LastEvaluatedKey value from the previous request, and
use that value as the
ExclusiveStartKey in the next
request. Repeat this until DynamoDB no longer returns a
Scan operations also return a
ScannedCount value. The
value is the total number of items that were queried or scanned, before any filter expression was applied to the results.
The DynamoDB Query and Scan APIs allow a
Limit value to
restrict the size of the results.
In a request, set the
Limit parameter to the number of items that
you want DynamoDB to process before returning results.
In a response, DynamoDB returns all the matching results within the scope of the
Limit value. For example, if you issue a Query or a Scan
request with a
Limit value of
6 and without a
filter expression, the operation returns the first six items in the table that match the request
parameters. If you also supply a
FilterExpression, the operation
returns the items within the first six items in the table that match the filter
For either a Query or Scan operation, DynamoDB might return a
LastEvaluatedKey value if the operation did not return all
matching items in the table. To get the full count of items that match, take the
LastEvaluatedKey from the previous request and use it as the
ExclusiveStartKey in the next request. Repeat this until
DynamoDB no longer returns a
Query result is an eventually consistent read, but you can
request a strongly consistent read instead. An eventually consistent read might not
reflect the results of a recently completed
UpdateItem operation. For more information, see Data Read and Consistency Considerations.
Scan result is an eventually consistent read, meaning that
changes to data immediately before the scan takes place might not be included in the
scan results. The result set will not contain any duplicate items.
Query operation is more efficient than a
Scan operation always scans the entire table or secondary index, then
filters out values to provide the desired result, essentially adding the extra step of
removing data from the result set. Avoid using a
Scan operation on
a large table or index with a filter that removes many results, if possible. Also, as a
table or index grows, the
Scan operation slows. The
Scan operation examines every item for the requested values,
and can use up the provisioned throughput for a large table or index in a single
operation. For faster response times, design your tables and indexes so that your
applications can use
Query instead of
(For tables, you can also consider using the
Alternatively, design your application to use
Scan operations in
a way that minimizes the impact on your table's request rate. For more information, see
Guidelines for Query and Scan.
Query operation searches for a specific range of keys that
satisfy a given set of key conditions. If you specify a filter expression, then DynamoDB must
perform the extra step of removing data from the result set. A
Query operation seeks the specified composite primary key, or
range of keys, until one of the following events occur:
The result set is exhausted.
The number of items retrieved reaches the value of the
The amount of data retrieved reaches the maximum result set size limit of 1 MB.
Query performance depends on the amount of data retrieved,
rather than the overall number of primary keys in a table or secondary index. The parameters for a
Query operation (and consequently the number of matching keys)
determine the performance of the query. For example, a query on a table that contains
a large set of range key elements for a single hash key element can be more efficient
than a query on another table that has fewer range key elements per hash key element, if the
number of matching keys in the first table is fewer than in the second. The total number
of primary keys, in either table, does not determine the efficiency of a
Query operation. A filter expression can also impact the efficiency
Query, because the items that don't match the filter must be
removed from the result set. Avoid using a
Query operation on a
large table or secondary index with a filter that removes many results, if possible.
If a specific hash key element has a large range key element set, and the results cannot
be retrieved in a single
Query request, the
ExclusiveStartKey continuation parameter allows you to submit
a new query request from the last retrieved item without re-processing the data already
By default, the
Scan operation processes data sequentially. DynamoDB
returns data to the application in 1 MB increments, and an application
Scan operations to retrieve the next
1 MB of data.
The larger the table or secondary index, the more time the
Scan will take to
complete. In addition, a sequential
Scan might not always be able to
fully utilize the provisioned read throughput capacity: Even though DynamoDB
distributes a large table's data across multiple physical partitions, a
Scan operation can only read one partition at a time. For this
reason, the throughput of a
Scan is constrained by the maximum
throughput of a single partition.
To address these issues, the
Scan operation can logically divide a
table or secondary index into multiple segments, with multiple application workers
scanning the segments in parallel. Each worker can be a thread (in programming languages
that support multithreading) or an operating system process. To perform a parallel scan,
each worker issues its own
Scan request with the following
Segment — A segment to be scanned by a particular worker. Each
worker should use a different value for
TotalSegments — The total number of segments for the parallel scan. This
value must be the same as the number of workers that your application will
The following diagram shows how a multithreaded application performs a parallel
Scan with three degrees of parallelism:
In this diagram, the application spawns three threads and assigns each thread a
number. (Segments are zero-based, so the first number is always 0.) Each thread issues a
Scan request, setting
Segment to its designated number
TotalSegments to 3. Each thread scans its designated segment,
retrieving data 1 MB at a time, and returns the data to the application's main
The values for
TotalSegments apply to individual
Scan requests, and you can use different values at any time. You
might need to experiment with these values, and the number of workers you use, until your
application achieves its best performance.
A parallel scan with a large number of workers can easily consume all of a table's provisioned throughput; it is best to avoid such scans if the table is also incurring heavy read or write activity from other applications.
To control the amount of data returned per request, use the
parameter. This can help prevent situations where one worker consumes all of the
provisioned throughput, at the expense of all other workers. For more information, see "Reduce Page Size" in Avoid Sudden Bursts of Read Activity.