Use CloudWatch metrics to monitor Amazon Managed Service for Prometheus resources - Amazon Managed Service for Prometheus

Use CloudWatch metrics to monitor Amazon Managed Service for Prometheus resources

Amazon Managed Service for Prometheus vends usage metrics to CloudWatch. These metrics provide visibility about your workspace utilization. The vended metrics can be found in the AWS/Usage and AWS/Prometheus namespaces in CloudWatch. These metrics are available in CloudWatch for no charge. For more information about usage metrics, see CloudWatch usage metrics.

CloudWatch metric name Resource name CloudWatch namespace Description

ResourceCount

IngestionRate

AWS/Usage

Sample ingestion rate

Units: count per second

Valid Statistics: Average, Minimum, Maximum, Sum

ResourceCount

ActiveSeries

AWS/Usage

Number of active series per workspace

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

ResourceCount

ActiveAlerts

AWS/Usage

Number of active alerts per workspace

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

ResourceCount

SizeOfAlerts

AWS/Usage

Total size of all alerts in the workspace, in bytes

Units: bytes

Valid Statistics: Average, Minimum, Maximum, Sum

ResourceCount

SuppressedAlerts

AWS/Usage

Number of alerts in suppressed state per workspace. An alert can be suppressed by a silence or inhibition.

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

ResourceCount

UnprocessedAlerts

AWS/Usage

Number of alerts in unprocessed state per workspace. An alert is in unprocessed state once it is received by AlertManager, but is waiting for the next aggregation group evaluation.

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

ResourceCount

AllAlerts

AWS/Usage

Number of alerts in any state per workspace.

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

AlertManagerAlertsReceived

-

AWS/Prometheus

Total successful alerts received by alert manager

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

AlertManagerNotificationsFailed

-

AWS/Prometheus

Number of failed alert deliveries

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

AlertManagerNotificationsThrottled

-

AWS/Prometheus

Number of throttled alerts

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

DiscardedSamples*

-

AWS/Prometheus

Number of discarded samples by reason

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

RuleEvaluations

-

AWS/Prometheus

Total number of rule evaluations

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

RuleEvaluationFailures

-

AWS/Prometheus

Number of rule evaluation failures in the interval

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

RuleGroupIterationsMissed

-

AWS/Prometheus

Number of Rule Group iterations missed in the interval.

Units: count

Valid Statistics: Average, Minimum, Maximum, Sum

*Some of the reasons that cause samples to be discarded are as follows.

Reason

Meaning

greater_than_max_sample_age

Discarding samples which are older than one hour.

new-value-for-timestamp

Duplicate samples are sent with a different timestamp than was previously recorded.

per_metric_series_limit

User has hit the active series per metric limit.

per_user_series_limit

User has hit the total number of active series limit.

rate_limited

Ingestion rate limited.

sample-out-of-order

Samples are sent out of order and cannot be processed.

label_value_too_long

Label value is longer than allowed character limit.

max_label_names_per_series

User has hit the label names per metric.

missing_metric_name

Metric name is not provided.

metric_name_invalid

Invalid metric name provided.

label_invalid

Invalid label provided.

duplicate_label_names

Duplicate label names provided.

Note

A metric not existing or missing is the same as the value of that metric being 0.

Note

RuleGroupIterationsMissed, RuleEvaluations, and RuleEvaluationFailureshave the RuleGroup dimension of the following structure:

RuleGroupNamespace;RuleGroup

Setting a CloudWatch alarm on Prometheus vended metrics

You can monitor usage of Prometheus resources using CloudWatch alarms.

To set an alarm on the number of ActiveSeries in Prometheus
  1. Choose the Graphed metrics tab and scroll down to the ActiveSeries label.

    In the Graphed metrics view, only the metrics currently being ingested will appear.

  2. Choose the notification icon in the Actions column.

  3. In Specify metric and conditions, enter the threshold condition in the Conditions value field and choose Next.

  4. In Configure actions, select an existing SNS topic or create a new SNS topic to send the notification to.

  5. In Add name and description, add the name of the alarm and an optional description.

  6. Choose Create alarm.