Amazon Managed Service for Prometheus vends usage metrics to CloudWatch. These metrics provide visibility about your
workspace utilization. The vended metrics can be found in the AWS/Usage
and
AWS/Prometheus
namespaces in CloudWatch. These metrics are available in CloudWatch
for no charge. For more information about usage metrics, see CloudWatch usage metrics.
CloudWatch metric name | Resource name | CloudWatch namespace | Description |
---|---|---|---|
ResourceCount* |
RemoteWriteTPS |
|
Remote write operations per second |
ResourceCount* |
QueryMetricsTPS |
|
Query operations per second |
ResourceCount |
IngestionRate |
|
Sample ingestion rate Units: count per second Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
ActiveSeries |
|
Number of active series per workspace Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
ActiveAlerts |
|
Number of active alerts per workspace Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
SizeOfAlerts |
|
Total size of all alerts in the workspace, in bytes Units: bytes Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
SuppressedAlerts |
|
Number of alerts in suppressed state per workspace. An alert can be suppressed by a silence or inhibition. Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
UnprocessedAlerts |
|
Number of alerts in unprocessed state per workspace. An alert is in unprocessed state once it is received by AlertManager, but is waiting for the next aggregation group evaluation. Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
ResourceCount |
AllAlerts |
|
Number of alerts in any state per workspace. Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
AlertManagerAlertsReceived |
- |
|
Total successful alerts received by alert manager Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
AlertManagerNotificationsFailed |
- |
|
Number of failed alert deliveries Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
AlertManagerNotificationsThrottled |
- |
|
Number of throttled alerts Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
DiscardedSamples** |
- |
|
Number of discarded samples by reason Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
QuerySamplesProcessed |
- |
|
Rate of query samples processed Units: count per second Valid Statistics: Average, Minimum, Maximum, Sum |
RuleEvaluations |
- |
|
Total number of rule evaluations Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
RuleEvaluationFailures |
- |
|
Number of rule evaluation failures in the interval Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
RuleGroupIterationsMissed |
- |
|
Number of Rule Group iterations missed in the interval. Units: count Valid Statistics: Average, Minimum, Maximum, Sum |
*TPS metrics are generated every minute and are a per-second average over that minute. Short burst periods will not be captured in the TPS metrics.
**Some of the reasons that cause samples to be discarded are as follows.
Reason |
Meaning |
---|---|
greater_than_max_sample_age |
Discarding samples which are older than one hour. |
new-value-for-timestamp |
Duplicate samples are sent with a different timestamp than was previously recorded. |
per_metric_series_limit |
User has hit the active series per metric limit. |
per_user_series_limit |
User has hit the total number of active series limit. |
rate_limited |
Ingestion rate limited. |
sample-out-of-order |
Samples are sent out of order and cannot be processed. |
label_value_too_long |
Label value is longer than allowed character limit. |
max_label_names_per_series |
User has hit the label names per metric. |
missing_metric_name |
Metric name is not provided. |
metric_name_invalid |
Invalid metric name provided. |
label_invalid |
Invalid label provided. |
duplicate_label_names |
Duplicate label names provided. |
Note
A metric not existing or missing is the same as the value of that metric being 0.
Note
RuleGroupIterationsMissed
, RuleEvaluations
, and
RuleEvaluationFailures
have the RuleGroup
dimension of
the following structure:
RuleGroupNamespace
;RuleGroup
Setting a CloudWatch alarm on Prometheus vended
metrics
You can monitor usage of Prometheus resources using CloudWatch alarms.
To set an alarm on the number of ActiveSeries in Prometheus
-
Choose the Graphed metrics tab and scroll down to the ActiveSeries label.
In the Graphed metrics view, only the metrics currently being ingested will appear.
-
Choose the notification icon in the Actions column.
-
In Specify metric and conditions, enter the threshold condition in the Conditions value field and choose Next.
-
In Configure actions, select an existing SNS topic or create a new SNS topic to send the notification to.
-
In Add name and description, add the name of the alarm and an optional description.
-
Choose Create alarm.