Monitoring Kinesis Video Streams Metrics with CloudWatch
You can monitor a Kinesis video stream using Amazon CloudWatch, which collects and processes raw data from Kinesis Video Streams into readable, near real-time metrics. These statistics are recorded for a period of 15 months, so that you can access historical information and gain a better perspective on how your web application or service is performing.
In the Kinesis Video Streams Management Console, you can view CloudWatch metrics for a Kinesis video stream in two ways:
-
In the Dashboard page, choose the Video streams tab in the Account-level metrics for Current Region section.
-
Choose the Monitoring tab in the video stream's details page.
Kinesis Video Streams provides the following metrics:
Metric | Description |
---|---|
ArchivedFragmentsConsumed.Media |
Number of fragment media quota points that were consumed by all of
the APIs. For an explanation of the concept of quota points, see Fragment-metadata and fragment-media quotas. Units: Count |
ArchivedFragmentsConsumed.Metadata |
Number of fragments metadata quota points that were consumed by all
of the APIs. For an explanation of the concept of quota points, see
Fragment-metadata and fragment-media quotas. Units: Count |
|
Number of Units: Count |
|
Number of bytes received as part of Units: Bytes |
|
Number of complete fragments received as part of
Units: Count |
|
Number of complete frames received as part of
Units: Count |
|
The total number of connections to the service host. Units: Count |
|
Errors while establishing Units: Count |
|
Time difference between when the first and last bytes of a fragment are received by Kinesis Video Streams. Units: Milliseconds |
|
Time taken from when the complete fragment data is received and archived. Units: Count |
|
Time difference between the request and the HTTP response from InletService while establishing the connection. Units: Count |
|
Time difference between when the first byte of a new fragment is received by Kinesis Video Streams and when the Buffering ACK is sent for the fragment. Units: Milliseconds |
|
Time difference between when the last byte of a new fragment is received by Kinesis Video Streams and when the Received ACK is sent for the fragment. Units: Milliseconds |
|
Time difference between when the last byte of a new fragment is received by Kinesis Video Streams and when the Persisted ACK is sent for the fragment. Units: Milliseconds |
|
Number of Error ACKs sent while doing Units: Count |
|
1 for each fragment successfully written; 0 for every failed fragment. The average value of this metric indicates how many complete, valid fragments are sent. Units: Count |
|
Number of Units: Count |
|
Total number of bytes sent out from the service as part of the
Units: Bytes |
|
Number of fragments sent while doing Units: Count |
|
Number of frames sent during Units: Count |
|
Time difference between the current server timestamp and the server timestamp of the last fragment sent. Units: Milliseconds |
|
The number of connections that were not successfully established. Units: Count |
|
1 for every fragment successfully sent; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Total number of bytes sent out from the service as part of the
Units: Bytes |
|
Total number of fragments sent out from the service as part of the
Units: Count |
|
Total number of frames sent out from the service as part of the
Units: Count |
|
Number of Units: Count |
|
1 for every fragment successfully sent; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Latency of the Units: Milliseconds |
|
Number of Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Latency of the Units: Milliseconds |
|
Number of Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Latency of the Units: Milliseconds |
|
Number of Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Latency of the Units: Milliseconds |
|
Number of Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Latency of the Units: Milliseconds |
|
Number of Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Latency of the Units: Milliseconds |
|
Number of Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Total number of bytes sent out from the service as part of the
Units: Bytes |
|
Latency of the Units: Milliseconds |
|
Number of Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Total number of bytes sent out from the service as part of the
Units: Bytes |
|
Latency of the Units: Milliseconds |
|
Number of Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Latency of the Units: Milliseconds |
|
Number of Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Latency of the GetClip API calls for the given video stream name. Units: Miliseconds |
|
Number of GetClip API requests for a given video stream. Units: Count |
|
1 for every successful request; 0 for every failure. The average value indicates the rate of success. Failures include both 400 (user) errors and 500 (system) errors. For more information about enabling a summary of requests and responses, including AWS request IDs, see Request/Response Summary Logging. Units: Count |
|
Total number of bytes sent out from the service as part of the GetClip API for a given video stream. Units: Bytes |
CloudWatch Metrics Guidance
CloudWatch metrics can be useful for finding answers to the following questions:
Topics
- Is data reaching the Kinesis Video Streams service?
- Why is data not being successfully ingested by the Kinesis Video Streams service?
- Why can't the data be read from the Kinesis Video Streams service at the same rate as it's being sent from the producer?
- Why is there no video in the console, or why is the video being played with a delay?
- What is the delay in reading real-time data, and why is the client lagging behind the head of the stream?
- Is the client reading data out of the Kinesis video stream, and at what rate?
- Why can't the client read data out of the Kinesis video stream?
Is data reaching the Kinesis Video Streams service?
Relevant metrics:
-
PutMedia.IncomingBytes
-
PutMedia.IncomingFragments
-
PutMedia.IncomingFrames
Action items:
-
If there is a drop in these metrics, check if your application is still sending data to the service.
-
Check the network bandwidth. If your network bandwidth is insufficient, it could be slowing down the rate the service is receiving the data.
Why is data not being successfully ingested by the Kinesis Video Streams service?
Relevant metrics:
-
PutMedia.Requests
-
PutMedia.ConnectionErrors
-
PutMedia.Success
-
PutMedia.ErrorAckCount
Action items:
-
If there is an increase in
PutMedia.ConnectionErrors
, look at the HTTP response/error codes received by the producer client to see what errors are occurring while establishing the connection. -
If there is a drop in
PutMedia.Success
or increase inPutMedia.ErrorAckCount
, look at the ack error code in the ack responses sent by the service to see why ingestion of data is failing. For more information, see AckErrorCode.Values.
Why can't the data be read from the Kinesis Video Streams service at the same rate as it's being sent from the producer?
Relevant metrics:
-
PutMedia.FragmentIngestionLatency
-
PutMedia.IncomingBytes
Action items:
-
If there is a drop in these metrics, check the network bandwidth of your connections. Low-bandwidth connections could cause the data to reach the service at a lower rate.
Why is there no video in the console, or why is the video being played with a delay?
Relevant metrics:
-
PutMedia.FragmentIngestionLatency
-
PutMedia.FragmentPersistLatency
-
PutMedia.Success
-
ListFragments.Latency
-
PutMedia.IncomingFragments
Action items:
-
If there is an increase in
PutMedia.FragmentIngestionLatency
or a drop inPutMedia.IncomingFragments
, check the network bandwidth and whether the data is still being sent. -
If there is a drop in
PutMedia.Success
, check the ack error codes. For more information, see AckErrorCode.Values. -
If there is an increase in
PutMedia.FragmentPersistLatency
orListFragments.Latency
, you are most likely experiencing a service issue. If the condition persists for an extended period of time, check with your customer service contact to see if there is an issue with your service.
What is the delay in reading real-time data, and why is the client lagging behind the head of the stream?
Relevant metrics:
-
GetMedia.MillisBehindNow
-
GetMedia.ConnectionErrors
-
GetMedia.Success
Action items:
-
If there is an increase in
GetMedia.ConnectionErrors
, then the consumer might be falling behind in reading the stream, due to frequent attempts to re-connect to the stream. Look at the HTTP response/error codes returned for theGetMedia
request. -
If there is a drop in
GetMedia.Success
, then it’s likely due to the service being unable to send the data to the consumer, which would result in dropped connection, and reconnects from consumers, which would result in the consumer lagging behind the head of the stream. -
If there is an increase in
GetMedia.MillisBehindNow
, look at your bandwidth limits to see if you are receiving the data at a slower rate because of lower bandwidth.
Is the client reading data out of the Kinesis video stream, and at what rate?
Relevant metrics:
-
GetMedia.OutgoingBytes
-
GetMedia.OutgoingFragments
-
GetMedia.OutgoingFrames
-
GetMediaForFragmentList.OutgoingBytes
-
GetMediaForFragmentList.OutgoingFragments
-
GetMediaForFragmentList.OutgoingFrames
Action items:
-
These metrics indicate what rate real-time and archived data is being read.
Why can't the client read data out of the Kinesis video stream?
Relevant metrics:
-
GetMedia.ConnectionErrors
-
GetMedia.Success
-
GetMediaForFragmentList.Success
-
PutMedia.IncomingBytes
Action items:
-
If there is an increase in
GetMedia.ConnectionErrors
, look at the HTTP response/error codes returned by theGetMedia
request. For more information, see AckErrorCode.Values. -
If you are trying to read the latest/live data, check
PutMedia.IncomingBytes
to see if there is data coming into the stream for the service to send to the consumers. -
If there is a drop in
GetMedia.Success
orGetMediaForFragmentList.Success
, it’s likely due to the service being unable to send the data to the consumer. If the condition persists for an extended period of time, check with your customer service contact to see if there is an issue with your service.