| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
Topics
Once there is data in an Amazon DynamoDB table, you have two APIs available for searching the data:
Query and Scan.
Query
A Query operation searches only primary key attribute values
and supports a subset of comparison operators on key attribute values to refine the
search process. A query returns all of the item data for the matching primary keys
(all of each item's attributes) up to 1 MB of data per query operation. A
Query operation always returns results, but can return
empty results.
If your table has one or more local secondary indexes, you can
Query those indexes in the same way that you query a table.
For more information, see Local Secondary Indexes
Query results are always sorted by the range key. If the data type of the range
key is Number, the results are returned in numeric order; otherwise, the results are
returned in order of ASCII character code values. By default, the sort order is
ascending. To reverse the order use the ScanIndexForward
parameter set to false.
Query supports a specific set of comparison operators. For
information about each comparison operator available for query operations, go to the
API entry for Query in the Amazon DynamoDB API Reference.
Scan
A Scan operation examines every item in the table. You can
specify filters to apply to the results to refine the values returned to you, after
the scan has finished. Amazon DynamoDB puts a 1 MB limit on the scan (the limit applies before
the results are filtered). A Scan can result in no table data
meeting the filter criteria.
Scan supports a specific set of comparison operators. For
information about each comparison operator available for scan operations, go to the
API entry for Scan in the Amazon DynamoDB API Reference.
Generally, a Query operation is more efficient than a
Scan operation.
A Scan operation always scans the entire table, then filters out
values to provide the desired result, essentially adding the extra step of removing data
from the result set. Avoid using a Scan operation on a large table
with a filter that removes many results, if possible. Also, as a table grows, the
Scan operation slows. The Scan operation
examines every item for the requested values, and can use up the provisioned throughput
for a large table in a single operation. For quicker response times, design your tables
in a way that can use the Query, Get, or
BatchGetItem APIs, instead. Alternatively, design your
application to use Scan operations in a way that minimizes the
impact on your table's request rate. For more information, see Guidelines for Working With Tables.
A Query operation only searches for a specific range of keys that
satisfy a given set of key conditions and does not have the added step of filtering out
results. A Query operation seeks the specified composite primary
key, or range of keys, until one of the following events occur:
The result set is exhausted.
The number of items retrieved reaches the value of the Limit parameter,
if specified.
The amount of data retrieved reaches the maximum result set size limit of 1 MB.
Query performance depends on the amount of data retrieved, rather
than the overall number of primary keys in a table. The parameters for a
Query operation (and consequently the number of matching keys)
determine the performance of the query. For example, a query on one table that contains
a large set of range key elements for a single hash key element can be more efficient
than a query on a table that has fewer range key elements per hash key element, if the
number of matching keys in the first table is fewer than in the second. The total number
of primary keys, in either table, does not determine the efficiency of a
Query operation.
If a specific hash key element has a large range key element set, and the results cannot
be retrieved in a single Query request, the
ExclusiveStartKey continuation parameter allows you to submit
a new query request from the last retrieved item without re-processing the data already
retrieved.
Amazon DynamoDB paginates the results from Query and
Scan operations. With pagination, Query
and Scan results are divided into distinct pieces; an application
can process the first page of results, then the second page, and so on. The data
returned from a Query or Scan operation is
limited to 1 MB; this means that if you scan a table that has more than 1 MB of data,
you'll need to perform another Scan operation to continue to the
next 1 MB of data in the table. If you query for specific attributes that match values
that amount to more than 1 MB of data, you'll need to perform another
Query request for the next 1 MB of data. The second query
request uses a starting point (ExclusiveStartKey) based on the
key of the last returned value (LastEvaluatedKey) so you can
progressively query or scan for new data in 1 MB increments. The
LastEvaluatedKey is null when the
entire Query or Scan result set is complete
(i.e. the operation processed the “last page”).
The Amazon DynamoDB Scan and Query APIs use the
Count parameter. Count is used for two
distinct purposes:
In a request, set the Count parameter to
true if you want Amazon DynamoDB to provide the total number of items
that match the scan filter or query condition, instead of a list of the matching
items.
In a response, Amazon DynamoDB returns a Count value for the
number of matching items in a request. If the matching items for a scan filter
or query condition is over 1 MB, Count contains a partial
count of the total number of items that match the request. To get the full count
of items that match a request, use the LastEvaluatedKey
in a subsequent request. Repeat the request until Amazon DynamoDB no longer returns a
LastEvaluatedKey.
For a Scan operation, Amazon DynamoDB also returns a ScannedCount value.
The ScannedCount value is the total number of items scanned
before any filter is applied to the results.
The Amazon DynamoDB Scan and Query APIs allow a Limit value to restrict the
size of the results.
In a request, set the Limit parameter to the number of items that
you want Amazon DynamoDB to process before returning results.
In a response, Amazon DynamoDB returns all the matching results within the scope of the
Limit value. For example, if you provide a
Limit value of 6 for a scan request, the scan
operation returns the items within the first six items in the table that match the scan
filter requirements (if provided). If no filter is provided, the scan operation returns
the first six items. If you provide a Limit value of
6 for a query request, the query operation processes six items in the
table that match the query parameters.
For either a scan or query operation, Amazon DynamoDB might return a
LastEvaluatedKey value if the operation did not return all
matching items in the table. To get the full count of items that match a request in a
table, use the LastEvaluatedKey in a subsequent request. Repeat
the request until Amazon DynamoDB no longer returns a
LastEvaluatedKey.
A Scan result is an eventually consistent read, meaning that changes
to data immediately before the scan takes place might not be included in the scan
results.
A Query result is an eventually consistent read, but you can
optionally request a strongly consistent read instead. An eventually consistent read
might not reflect the results of a recently completed PutItem or
UpdateItem operation. For more information, see Data Read and Consistency Considerations.
When you create a table, you specify your read and write capacity unit requirements.
When you issue a Query or a Scan request, you
consume the capacity units that you allocated for the table. For more information about
how Amazon DynamoDB computes the capacity units consumed by your operation, see Capacity Units Calculations for Various Operations.
By default, the Scan operation processes data sequentially. Amazon DynamoDB
returns data to the application in 1 MB increments, and an application
performs additional Scan operations to retrieve the next
1 MB of data.
The larger the table, the more time the Scan will take to
complete. In addition, a sequential Scan might not always be able to
fully utilize the table's provisioned read throughput capacity: Even though Amazon DynamoDB
distributes a large table's data across multiple physical partitions, a
Scan operation can only read one partition at a time. For this
reason, the throughput of a Scan is constrained by the maximum
throughput of a single partition.
To address these issues, the Scan operation can logically divide a
table into multiple segments, with multiple application workers
scanning the segments in parallel. Each worker can be a thread (in programming languages
that support multithreading) or an operating system process. To perform a parallel scan,
each worker issues its own Scan request with the following
parameters:
Segment — A segment to be scanned by a particular worker. Each
worker should use a different value for Segment.
TotalSegments — The total number of segments for the parallel scan. This
value must be the same as the number of workers that your application will
use.
The following diagram shows how a multithreaded application performs a parallel Scan with three degrees of parallelism:

In this diagram, the application spawns three threads and assigns each thread a
number. (Segments are zero-based, so the first number is always 0.) Each thread issues a
Scan request, setting Segment to its designated number
and setting TotalSegments to 3. Each thread scans its designated segment,
retrieving data 1 MB at a time, and returns the data to the application's main
thread.
The values for Segment and TotalSegments apply to individual
Scan requests, and you can use different values at any time. You
might need to experiment with these values, and the number of workers you use, until your
application achieves its best performance.
Note
A parallel scan with a large number of workers can easily consume all of a table's provisioned throughput; it is best to avoid such scans if the table is also incurring heavy read or write activity from other applications.
To control the amount of data returned per request, use the Limit
parameter. This can help prevent situations where one worker consumes all of the
provisioned throughput, at the expense of all other workers. For more information, see "Reduce Page Size" in Avoid Sudden Bursts of Read Activity.