Monitoring the Amazon Kinesis Video Streams Edge Agent with CloudWatch - Amazon Kinesis Video Streams

Monitoring the Amazon Kinesis Video Streams Edge Agent with CloudWatch

You can monitor the Amazon Kinesis Video Streams Edge Agent using Amazon CloudWatch, which collects and processes raw data into readable, near real-time metrics. These statistics are recorded for a period of 15 months. With this historical information, you can gain a better perspective on how your web application or Amazon Kinesis Video Streams Edge Agent service is performing.

To view the metrics, do the following:

  1. Sign in to the AWS Management Console and open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

  2. In the left navigation, under Metrics, select All Metrics.

  3. Choose the Browse tab, then select the EdgeRuntimeAgent custom namespace.

Amazon Kinesis Video Streams Edge Agent publishes the following metrics under the namespace EdgeRuntimeAgent:

Dimensions State Description

Stream name, RecordJob

Running

Publishes continuously when the RecordJob is running.

Units: None. "1" is published for as long as RecordJob is in this state.

FatalError

Publishes if a RecordJob fatally errors.

Units: None. "1" is published once, when this event occurs.

Note

See logs for additional information.

Completed

Publishes when a RecordJob is completed.

Units: None. "1" is published once, when this event occurs.

Stream name, UploadJob

Running

Publishes continuously when the UploadJob is running.

Units: None. "1" is published for as long as UploadJob is in this state.

FatalError

Publishes if the UploadJob fatally errors.

Units: None. "1" is published once, when this event occurs.

Note

See logs for additional information.

Completed

Publishes when the UploadJob is completed.

Units: None. "1" is published once, when this event occurs.

Stream name

PercentageSpaceUsed

This is the percentage used out of the total space allocated in Amazon Kinesis Video Streams Edge Agent configurations for recording media. See LocalSizeConfig for more information.

Units: Percentage (scale 0–1).

Thing name

Alive

Publishes every minute from the Amazon Kinesis Video Streams Edge Agent, regardless of any configurations running on it.

This can be used to understand if the Amazon Kinesis Video Streams Edge Agent is alive and ready to accept configurations.

Units: None. "1" is published every minute.

RecordJobs.HealthyJobCount

Total count of running and scheduled record jobs on Amazon Kinesis Video Streams Edge Agent.

Units: Count.

UploadJobs.HealthyJobCount

Total count of running and scheduled upload jobs on Amazon Kinesis Video Streams Edge Agent.

Units: Count.

RecordJobs.UnhealthyJobCount

Total count of currently errored record jobs.

Units: Count.

UploadJobs.UnhealthyJobCount

Total count of currently errored upload jobs.

Units: Count.

RecordJobs.RunningJobCount

Total count of actively running record jobs.

Units: Count.

UploadJobs.RunningJobCount

Total count of actively running upload jobs.

Units: Count.

RecordJobs.EdgeConfigCount

Total count of record configurations in process on Amazon Kinesis Video Streams Edge Agent.

Units: Count.

UploadJobs.EdgeConfigCount

Total count of upload configurations in process on Amazon Kinesis Video Streams Edge Agent.

Units: Count.

CloudWatch metrics guidance for Amazon Kinesis Video Streams Edge Agent

CloudWatch metrics can be useful for finding answers to the following questions:

Does the Amazon Kinesis Video Streams Edge Agent have enough space to record?

Relevant metrics: PercentageSpaceUsed

Action: No action required.

Is the Amazon Kinesis Video Streams Edge Agent alive?

Relevant metrics: Alive

Action: If at any point you stop receiving this metric, it means that the Amazon Kinesis Video Streams Edge Agent encountered one or more of the following:

  • An application runtime issue: memory or other resource constraint, bug, and so on

  • The AWS IoT device that the agent is running on shutdown, crashed, or terminated

  • The AWS IoT device doesn't have network connectivity

Are there any unhealthy jobs?

Relevant metrics:

  • RecordJobs.UnhealthyJobCount

  • UploadJobs.UnhealthyJobCount

Action: Inspect the logs and look for the FatalError metric.

  • If the FatalError metric is present, a fatal error was encountered and you need to manually restart the job. Inspect the logs and fix the issue before using StartEdgeConfigurationUpdate to manually restart the job.

  • If the FatalError metric isn't present, a transient (non-fatal) error was encountered and Amazon Kinesis Video Streams Edge Agent is retrying the job.

Note

To have the agent reattempt a fatally-errored job, use StartEdgeConfigurationUpdate.

Do any jobs need external intervention?

Relevant metrics:

  • PercentageSpaceUsed – If this exceeds a certain value, the record job is paused and resumes only when space is available (when media goes out of retention). You can send an updated configuration with a higher MaxLocalMediaSizeInMB to update the job immediately.

  • RecordJob.FatalError / UploadJob.FatalError – Investigate the agent's logs and send the configuration again for the job to resume.

Action: Make an API call with the configuration to restart jobs that encounter this problem.