Send high-availability data with Prometheus or the Prometheus Operator - Amazon Managed Service for Prometheus

Send high-availability data with Prometheus or the Prometheus Operator

With Amazon Managed Service for Prometheus, you can use multiple Prometheus instances as servers in high-availability mode. This section shows you how to set up Prometheus servers as collectors with a high-availability configuration, so Amazon Managed Service for Prometheus deduplicates your metrics and doesn't charge you twice.

When you set up deduplication, Amazon Managed Service for Prometheus makes one Prometheus instance a leader replica and ingests data samples only from that replica. If the leader replica stops sending data samples to Amazon Managed Service for Prometheus for 30 seconds, Amazon Managed Service for Prometheus automatically makes another Prometheus instance a leader replica and ingests data from the new leader.

Important

If you do not set up deduplication, you will be charged for all data samples that are sent to Amazon Managed Service for Prometheus. These data samples include duplicate samples.

Send high-availability data to Amazon Managed Service for Prometheus with Prometheus

To set up a high-availability configuration with Prometheus, you must apply external labels on all instances of a high-availability group, so Amazon Managed Service for Prometheus can identify them. Use the cluster label to identify a Prometheus instance agent as part of a high-availability group. Use the __replica__ label to identify each replica in the group separately. You need to apply both __replica__ and cluster labels for de-duplication to work.

Note

The __replica__ label is formatted with two underscore symbols before and after the word replica.

Example: code snippets

In the following code snippets, the cluster label identifies the Prometheus instance agent prom-team1, and the _replica_ label identifies the replicas replica1 and replica2.

cluster: prom-team1 __replica__: replica1
cluster: prom-team1 __replica__: replica2

As Amazon Managed Service for Prometheus stores data samples from high-availability replicas with these labels, it strips the replica label when the samples are accepted. This means that you will only have a 1:1 series mapping for your current series instead of a series per replica. The cluster label is kept.

Send high-availability data to Amazon Managed Service for Prometheus with the Prometheus Operator

To set up a high-availability configuration with the Prometheus Operator, you must apply external labels on all instances of a high-availability group, so Amazon Managed Service for Prometheus can identify them. You also must set the attributes replicaExternalLabelName and externalLabels on the Prometheus Operator Helm chart.

Example: YAML header

In the following YAML header, cluster is added to externalLabel to identify a Prometheus instance agent as part of a high-availability group, and replicaExternalLabels identifies each replica in the group.

replicaExternalLabelName: __replica__ externalLabels: cluster: prom-dev

Send high-availability data to Amazon Managed Service for Prometheus with AWS Distro for Open Telemetry

AWS Distro for Open Telemetry (ADOT) is a secure and production-ready distribution of the OpenTelemetry project. ADOT provides you with source APIs, libraries, and agents, so you can collect distributed traces and metrics for application monitoring. For information about ADOT, see About AWS Distro for Open Telemetry.

To set up ADOT with a high-availability configuration, you must configure an ADOT collector container image and apply the external labels cluster and _replica_ to the AWS Prometheus remote write exporter. This exporter sends your scraped metrics to your Amazon Managed Service for Prometheus workspace via the remote_write endpoint. When you set these labels on the remote write exporter, you prevent duplicate metrics from being kept while redundant replicas run. For more information about the AWS Prometheus remote write exporter, see Getting started with Prometheus remote write exporter for Amazon Managed Service for Prometheus.

FAQ: High availability configuration

Should I include the value __replica__ into another label to track the sample points?

In a high-availability setting, Amazon Managed Service for Prometheus ensures data samples are not duplicated by electing a leader in the cluster of Prometheus instances. If the leader replica stops sending data samples for 30 seconds, Amazon Managed Service for Prometheus automatically makes another Prometheus instance a leader replica and ingests data from the new leader. Therefore, the answer is no, it is not recommended.  Doing so may cause issues like:

  • Querying a count in PromQL may return higher than expected value during the period of electing a new leader.

  • The number of active series gets increased during a period of electing a new leader and it reaches the active series limits. See AMP Quotas for more info.