Monitoring with Amazon CloudWatch
You can monitor Amazon DynamoDB using CloudWatch, which collects and processes raw data from DynamoDB into readable, near real-time metrics. These statistics are retained for a period of time, so that you can access historical information for a better perspective on how your web application or service is performing. By default, DynamoDB metric data is sent to CloudWatch automatically. For more information, see What is Amazon CloudWatch? and Metrics retention in the Amazon CloudWatch User Guide.
Topics
How do I use DynamoDB metrics?
The metrics reported by DynamoDB provide information that you can analyze in different ways. The following list shows some common uses for the metrics. These are suggestions to get you started, not a comprehensive list.
How can I? |
Relevant metrics |
---|---|
How can I monitor the rate of TTL deletions on my
table?
|
You can monitor |
How can I determine how much of my provisioned throughput is
being used?
|
You can monitor |
How can I determine which requests exceed the provisioned
throughput quotas of a table?
|
|
How can I determine if any system errors occurred?
|
You can monitor NoteYou might encounter internal server errors while working with items. These are expected during the lifetime of a table. Any failed requests can be retried immediately. |
Understanding and analyzing DynamoDB response times
When analyzing latency, it's generally best to check the average. Occasional spikes in latency aren't a cause for concern. However, if average latency is high then an underlying issue could be responsible.
There are two categories of latency, API latency and service side latency.
DynamoDB's API latency is measured from steps 1 through 11 in the process below.
Service-side latency is measured from the moment an API reaches the Request Router (step 4 ) to the time
the RR takes to send a result back to the application user (step 11).
You can analyze service side latency with the Amazon CloudWatch metric SuccessfulRequestLatency
.
When any application makes any DynamoDB API call to DynamoDB, such as issuing a GetItem
operation
to a DynamoDB table, the following steps occur.
The application resolves the DynamoDB public endpoint using local DNS server.
The application connects to the IP address resolved in step one, and makes an API call.
The DynamoDB public endpoint takes the request and forwards it to a component called Request Router (RR).
Once the API request reaches the Request Router (RR), RR does authentication and authorization of the API call. RR also does the throttling checks at this stage.
After completing all the checks Request Router (RR) creates the hash of the partition key value which it gets from the API request. Based upon hash value, Request Router (RR) finds the partition information (storage node) details.
The Storage Nodes represent the servers where customer table data is stored. A single partition (not to be confused with primary or partition key), consists of a set of 3 Storage Nodes. Out of these three Storage Nodes, one node acts as a leader for that partition and the remaining two nodes acts as follower.
If the API call is a write request or if it is strongly consistent read request, then Request Routers (RR) finds the leader node for that partition and forwards the API request to that specific node. In case of eventually consistent read request, Request Routers(RR) forwards the request to leader node or any of the follower nodes for that partition randomly.
In normal circumstances, the Request Router reaches the the Storage Node in first attempt. If this attempt fails, RR does multiple retries to reach the storage node. While connecting to storage node, RR always gives enough time to the existing attempt and then tries to connect to the different SN. This is an internal micro-service retry, and not a configurable SDK retry.
At this stage the API request reaches to Storage Node (SN) layer and SN starts processing it, reading or writing the data depending of the API call.
After successfully processing the API request, the Storage Node (SN) return the results or response code to the RR which originated the request.
Finally, Request Router forwards the results to the customer application.
Note
For most atomic operations, such as
GetItem
andPutItem
, you can expect an average latency in single-digit milliseconds. Latency for non-atomic operations such asQuery
andScan
depends on many factors, including the size of the result set and the complexity of the query conditions and filters.DynamoDB doesn't measure the amount of time an application takes to connect with the DynamoDB public endpoint, or the amount of time an application takes to download the results from the public endpoint.