Use CloudWatch metrics to monitor Amazon Managed Service for Prometheus resources

Focus mode

Use CloudWatch metrics to monitor Amazon Managed Service for Prometheus resources - Amazon Managed Service for Prometheus

Setting a CloudWatch alarm

Amazon Managed Service for Prometheus vends usage metrics to CloudWatch. These metrics provide visibility about your workspace utilization. The vended metrics can be found in the AWS/Usage and AWS/Prometheus namespaces in CloudWatch. These metrics are available in CloudWatch for no charge. For more information about usage metrics, see CloudWatch usage metrics.

CloudWatch metric name	Resource name	CloudWatch namespace	Description
ResourceCount^*	RemoteWriteTPS	`AWS/Usage`	Remote write operations per second
ResourceCount^*	QueryMetricsTPS	`AWS/Usage`	Query operations per second
ResourceCount	IngestionRate	`AWS/Usage`	Sample ingestion rate Units: count per second Valid Statistics: Average, Minimum, Maximum, Sum
ResourceCount	ActiveSeries	`AWS/Usage`	Number of active series per workspace Units: count Valid Statistics: Average, Minimum, Maximum, Sum
ResourceCount	ActiveAlerts	`AWS/Usage`	Number of active alerts per workspace Units: count Valid Statistics: Average, Minimum, Maximum, Sum
ResourceCount	SizeOfAlerts	`AWS/Usage`	Total size of all alerts in the workspace, in bytes Units: bytes Valid Statistics: Average, Minimum, Maximum, Sum
ResourceCount	SuppressedAlerts	`AWS/Usage`	Number of alerts in suppressed state per workspace. An alert can be suppressed by a silence or inhibition. Units: count Valid Statistics: Average, Minimum, Maximum, Sum
ResourceCount	UnprocessedAlerts	`AWS/Usage`	Number of alerts in unprocessed state per workspace. An alert is in unprocessed state once it is received by AlertManager, but is waiting for the next aggregation group evaluation. Units: count Valid Statistics: Average, Minimum, Maximum, Sum
ResourceCount	AllAlerts	`AWS/Usage`	Number of alerts in any state per workspace. Units: count Valid Statistics: Average, Minimum, Maximum, Sum
AlertManagerAlertsReceived	-	`AWS/Prometheus`	Total successful alerts received by alert manager Units: count Valid Statistics: Average, Minimum, Maximum, Sum
AlertManagerNotificationsFailed	-	`AWS/Prometheus`	Number of failed alert deliveries Units: count Valid Statistics: Average, Minimum, Maximum, Sum
AlertManagerNotificationsThrottled	-	`AWS/Prometheus`	Number of throttled alerts Units: count Valid Statistics: Average, Minimum, Maximum, Sum
DiscardedSamples^**	-	`AWS/Prometheus`	Number of discarded samples by reason Units: count Valid Statistics: Average, Minimum, Maximum, Sum
QuerySamplesProcessed	-	`AWS/Prometheus`	Rate of query samples processed Units: count per second Valid Statistics: Average, Minimum, Maximum, Sum
RuleEvaluations	-	`AWS/Prometheus`	Total number of rule evaluations Units: count Valid Statistics: Average, Minimum, Maximum, Sum
RuleEvaluationFailures	-	`AWS/Prometheus`	Number of rule evaluation failures in the interval Units: count Valid Statistics: Average, Minimum, Maximum, Sum
RuleGroupIterationsMissed	-	`AWS/Prometheus`	Number of Rule Group iterations missed in the interval. Units: count Valid Statistics: Average, Minimum, Maximum, Sum

^*TPS metrics are generated every minute and are a per-second average over that minute. Short burst periods will not be captured in the TPS metrics.

^**Some of the reasons that cause samples to be discarded are as follows.

Reason	Meaning
greater_than_max_sample_age	Discarding samples which are older than one hour.
new-value-for-timestamp	Duplicate samples are sent with a different timestamp than was previously recorded.
per_metric_series_limit	User has hit the active series per metric limit.
per_user_series_limit	User has hit the total number of active series limit.
rate_limited	Ingestion rate limited.
sample-out-of-order	Samples are sent out of order and cannot be processed.
label_value_too_long	Label value is longer than allowed character limit.
max_label_names_per_series	User has hit the label names per metric.
missing_metric_name	Metric name is not provided.
metric_name_invalid	Invalid metric name provided.
label_invalid	Invalid label provided.
duplicate_label_names	Duplicate label names provided.

Note

A metric not existing or missing is the same as the value of that metric being 0.

Note

RuleGroupIterationsMissed, RuleEvaluations, and RuleEvaluationFailureshave the RuleGroup dimension of the following structure:

RuleGroupNamespace;RuleGroup

Setting a CloudWatch alarm on Prometheus vended metrics

You can monitor usage of Prometheus resources using CloudWatch alarms.

To set an alarm on the number of ActiveSeries in Prometheus

Choose the Graphed metrics tab and scroll down to the ActiveSeries label.

In the Graphed metrics view, only the metrics currently being ingested will appear.
Choose the notification icon in the Actions column.
In Specify metric and conditions, enter the threshold condition in the Conditions value field and choose Next.
In Configure actions, select an existing SNS topic or create a new SNS topic to send the notification to.
In Add name and description, add the name of the alarm and an optional description.
Choose Create alarm.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Monitoring workspaces

CloudWatch Logs

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Use CloudWatch metrics to monitor Amazon Managed Service for Prometheus resources

Note

Note

Setting a CloudWatch alarm on Prometheus vended metrics

To set an alarm on the number of ActiveSeries in Prometheus

On this page

Did this page help you?

Next topic:

Previous topic:

Need help?