Send high-availability data with Prometheus or the Prometheus Operator
With Amazon Managed Service for Prometheus, you can use multiple Prometheus instances as servers in high-availability mode. This section shows you how to set up Prometheus servers as collectors with a high-availability configuration, so Amazon Managed Service for Prometheus deduplicates your metrics and doesn't charge you twice.
When you set up deduplication, Amazon Managed Service for Prometheus makes one Prometheus instance a leader replica and ingests data samples only from that replica. If the leader replica stops sending data samples to Amazon Managed Service for Prometheus for 30 seconds, Amazon Managed Service for Prometheus automatically makes another Prometheus instance a leader replica and ingests data from the new leader.
If you do not set up deduplication, you will be charged for all data samples that are sent to Amazon Managed Service for Prometheus. These data samples include duplicate samples.
Send high-availability data to Amazon Managed Service for Prometheus with Prometheus
To set up a high-availability configuration with Prometheus, you must apply
external labels on all instances of a high-availability group, so Amazon Managed Service for Prometheus can
identify them. Use the cluster
label to identify a Prometheus
instance agent as part of a high-availability group. Use the
__replica__
label to identify each replica in the group
separately. You need to apply both __replica__
and
cluster
labels for de-duplication to work.
The __replica__
label is formatted with two underscore
symbols before and after the word replica
.
Example: code snippets
In the following code snippets, the cluster
label identifies the
Prometheus instance agent prom-team1
, and the
_replica_
label identifies the replicas replica1
and replica2
.
cluster: prom-team1 __replica__: replica1
cluster: prom-team1 __replica__: replica2
As Amazon Managed Service for Prometheus stores data samples from high-availability replicas with these
labels, it strips the replica
label when the samples are accepted.
This means that you will only have a 1:1 series mapping for your current series
instead of a series per replica. The cluster
label is kept.
Send high-availability data to Amazon Managed Service for Prometheus with the Prometheus Operator
To set up a high-availability configuration with the Prometheus Operator, you
must apply external labels on all instances of a high-availability group, so
Amazon Managed Service for Prometheus can identify them. You also must set the attributes
replicaExternalLabelName
and externalLabels
on the
Prometheus Operator Helm chart.
Example: YAML header
In the following YAML header, cluster
is added to
externalLabel
to identify a Prometheus instance agent as part
of a high-availability group, and replicaExternalLabels
identifies
each replica in the group.
replicaExternalLabelName: __replica__ externalLabels: cluster: prom-dev
Send high-availability data to Amazon Managed Service for Prometheus with AWS Distro for Open Telemetry
AWS Distro for Open Telemetry (ADOT) is a secure and production-ready
distribution of the OpenTelemetry project. ADOT provides you with source APIs,
libraries, and agents, so you can collect distributed traces and metrics for
application monitoring. For information about ADOT, see About AWS Distro for Open
Telemetry
To set up ADOT with a high-availability configuration, you must configure an
ADOT collector container image and apply the external labels
cluster
and _replica_
to the AWS Prometheus
remote write exporter. This exporter sends your scraped metrics to your
Amazon Managed Service for Prometheus workspace via the remote_write
endpoint. When you set
these labels on the remote write exporter, you prevent duplicate metrics from
being kept while redundant replicas run. For more information about the AWS
Prometheus remote write exporter, see Getting started with Prometheus remote write exporter for
Amazon Managed Service for Prometheus
FAQ: High availability configuration
Should I include the value __replica__
into
another label to track the sample points?
In a high-availability setting, Amazon Managed Service for Prometheus ensures data samples are not duplicated by electing a leader in the cluster of Prometheus instances. If the leader replica stops sending data samples for 30 seconds, Amazon Managed Service for Prometheus automatically makes another Prometheus instance a leader replica and ingests data from the new leader. Therefore, the answer is no, it is not recommended. Doing so may cause issues like:
-
Querying a
count
in PromQL may return higher than expected value during the period of electing a new leader. -
The number of
active series
gets increased during a period of electing a new leader and it reaches theactive series limits
. See AMP Quotas for more info.