Using an AWS managed collector - Amazon Managed Service for Prometheus

Using an AWS managed collector

To use an Amazon Managed Service for Prometheus collector, you must create a scraper that discovers and pulls metrics in your Amazon EKS cluster.

  • You can create a scraper as part of your Amazon EKS cluster creation. For more information about creating an Amazon EKS cluster, including creating a scraper, see Creating an Amazon EKS cluster in the Amazon EKS User Guide.

  • You can create your own scraper, programmatically with the AWS API or by using the AWS CLI.

Note

Amazon Managed Service for Prometheus workspaces created with customer managed keys cannot use AWS managed collectors for ingestion.

An Amazon Managed Service for Prometheus collector scrapes metrics that are Prometheus-compatible. For more information about Prometheus compatible metrics, see What are Prometheus-compatible metrics?.

Note

Scraping metrics from a cluster may incur charges for network usage, for example, for cross Region traffic. One way to optimize these costs is to configure your /metrics endpoint to compress the provided metrics (for example, with gzip), reducing the data that must be moved across the network. How to do this depends on the application or library providing the metrics. Some libraries gzip by default.

The following topics describe how to create, manage, and configure scrapers.

Create a scraper

An Amazon Managed Service for Prometheus collector consists of a scraper that discovers and collects metrics from an Amazon EKS cluster. Amazon Managed Service for Prometheus manages the scraper for you, giving you the scalability, security, and reliability that you need, without having to manage any instances, agents, or scrapers yourself.

A scraper is automatically created for you when you create an Amazon EKS cluster through the Amazon EKS console. However, in some situations you might want to create a scraper yourself. For example, if you want to add an AWS managed collector to an existing Amazon EKS cluster, or if you want to change the configuration of an existing collector.

You can create a scraper using either the AWS API or the AWS CLI.

There are a few prerequisites for creating your own scraper:

  • You must have an Amazon EKS cluster created.

  • Your Amazon EKS cluster must have cluster endpoint access control set to include private access. It can include private and public, but must include private.

Note

The cluster will be associated with the scraper by its Amazon resource name (ARN). If you delete a cluster, and then create a new one with the same name, the ARN will be reused for the new cluster. Because of this, the scraper will attempt to collect metrics for the new cluster. You delete scrapers separately from deleting the cluster.

AWS API

To create a scraper using the AWS API

Use the CreateScraper API operation to create a scraper with the AWS API. The following example creates a scraper in the us-west-2 Region. You need to replace the AWS account, workspace, security, and Amazon EKS cluster information with your own IDs, and provide the configuration to use for your scraper.

Note

The security group and subnets should be set to the security group and subnets for the cluster to which you are connecting.

You must include at least two subnets, in at least two availability zones.

The scrapeConfiguration is a Prometheus configuration YAML file that is base64 encoded. You can download a general purpose configuration with the GetDefaultScraperConfiguration API operation. For more information about the format of the scrapeConfiguration, see Scraper configuration.

POST /scrapers HTTP/1.1 Content-Length: 415 Authorization: AUTHPARAMS X-Amz-Date: 20201201T193725Z User-Agent: aws-cli/1.18.147 Python/2.7.18 Linux/5.4.58-37.125.amzn2int.x86_64 botocore/1.18.6 { "alias": "myScraper", "destination": { "ampConfiguration": { "workspaceArn": "arn:aws:aps:us-west-2:account-id:workspace/ws-workspace-id" } }, "source": { "eksConfiguration": { "clusterArn": "arn:aws:eks:us-west-2:account-id:cluster/cluster-name", "securityGroupIds": ["sg-security-group-id"], "subnetIds": ["subnet-subnet-id-1", "subnet-subnet-id-2"] } }, "scrapeConfiguration": { "configurationBlob": <base64-encoded-blob> } }
AWS CLI

To create a scraper using the AWS CLI

Use the create-scraper command to create a scraper with the the AWS CLI. The following example creates a scraper in the us-west-2 Region. You need to replace the AWS account, workspace, security, and Amazon EKS cluster information with your own IDs, and provide the configuration to use for your scraper.

Note

The security group and subnets should be set to the security group and subnets for the cluster to which you are connecting.

You must include at least two subnets, in at least two availability zones.

The scrape-configuration is a Prometheus configuration YAML file that is base64 encoded. You can download a general purpose configuration with the get-default-scraper-configuration command. For more information about the format of the scrape-configuration, see Scraper configuration.

aws amp create-scraper \ --source eksConfiguration="{clusterArn='arn:aws:eks:us-west-2:account-id:cluster/cluster-name', securityGroupIds=['sg-security-group-id'],subnetIds=['subnet-subnet-id-1', 'subnet-subnet-id-2']}" \ --scrape-configuration configurationBlob=<base64-encoded-blob> \ --destination ampConfiguration="{workspaceArn='arn:aws:aps:us-west-2:account-id:workspace/ws-workspace-id'}"

The following is a full list of the scraper operations that you can use with the AWS API:

Note

The Amazon EKS cluster that you are scraping must be configured to allow Amazon Managed Service for Prometheus to access the metrics. The next topic describes how to configure your cluster.

Common errors when creating scrapers

The following are the most common issues when attempting to create a new scraper.

  • Required AWS resources don't exist. The security group, subnet, and Amazon EKS cluster specified must exist.

  • Insufficient IP address space. You must have at least one IP address available in each subnet that you pass into the CreateScraper API.

Configuring your Amazon EKS cluster

Your Amazon EKS cluster must be configured to allow the scraper to access metrics. There are two options for this configuration:

  • Use Amazon EKS access entries to automatically provide Amazon Managed Service for Prometheus collectors access to your cluster.

  • Manually configure your Amazon EKS cluster for managed metric scraping.

The following topics describe each of these in more detail.

Configure Amazon EKS for scraper access with access entries

Using access entries for Amazon EKS is the easiest way to give Amazon Managed Service for Prometheus access to scrape metrics from your cluster.

The Amazon EKS cluster that you are scraping must be configured to allow API authentication. The cluster authentication mode must be set to either API or API_AND_CONFIG_MAP. This is viewable in the Amazon EKS console on the Access configuration tab of the cluster details. For more information, see Allowing IAM roles or users access to Kubernetes object on your Amazon EKS cluster in the Amazon EKS User Guide.

You can create the scraper when creating the cluster, or after creating the cluster:

  • When creating a cluster – You can configure this access when you create an Amazon EKS cluster through the Amazon EKS console (follow the instructions to create a scraper as part of the cluster), and an access entry policy will automatically be created, giving Amazon Managed Service for Prometheus access to the cluster metrics.

  • Adding after a cluster is created – if your Amazon EKS cluster already exists, then set the authentication mode to either API or API_AND_CONFIG_MAP, and any scrapers you create through the Amazon Managed Service for Prometheus API or CLI will automatically have the correct access entry policy created for you, and the scrapers will have access to your cluster.

Access entry policy created

When you create a scraper and let Amazon Managed Service for Prometheus generate an access entry policy for you, it generates the following policy. For more information about access entries, see Allowing IAM roles or users access to Kubernetes in the Amazon EKS User Guide.

{ "rules": [ { "effect": "allow", "apiGroups": [ "" ], "resources": [ "nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods", "ingresses", "configmaps" ], "verbs": [ "get", "list", "watch" ] }, { "effect": "allow", "apiGroups": [ "extensions", "networking.k8s.io" ], "resources": [ "ingresses/status", "ingresses" ], "verbs": [ "get", "list", "watch" ] }, { "effect": "allow", "nonResourceURLs": [ "/metrics" ], "verbs": [ "get" ] } ] }

Manually configuring Amazon EKS for scraper access

If you prefer to use the aws-auth ConfigMap to control access to your kubernetes cluster, you can still give Amazon Managed Service for Prometheus scrapers access to your metrics. The following steps will give Amazon Managed Service for Prometheus access to scrape metrics from your Amazon EKS cluster.

Note

For more information about ConfigMap and access entries, see Allowing IAM roles or users access to Kubernetes in the Amazon EKS User Guide.

This procedure uses kubectl and the AWS CLI. For information about installing kubectl, see Installing kubectl in the Amazon EKS User Guide.

To manually configure your Amazon EKS cluster for managed metric scraping
  1. Create a file, called clusterrole-binding.yml, with the following text:

    apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: aps-collector-role rules: - apiGroups: [""] resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods", "ingresses", "configmaps"] verbs: ["describe", "get", "list", "watch"] - apiGroups: ["extensions", "networking.k8s.io"] resources: ["ingresses/status", "ingresses"] verbs: ["describe", "get", "list", "watch"] - nonResourceURLs: ["/metrics"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: aps-collector-user-role-binding subjects: - kind: User name: aps-collector-user apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: aps-collector-role apiGroup: rbac.authorization.k8s.io
  2. Run the following command in your cluster:

    kubectl apply -f clusterrole-binding.yml

    This will create the cluster role binding and rule. This example uses aps-collector-role as the role name, and aps-collector-user as the user name.

  3. The following command gives you information about the scraper with the ID scraper-id. This is the scraper that you created using the command in the previous section.

    aws amp describe-scraper --scraper-id scraper-id
  4. From the results of the describe-scraper, find the roleArn.This will have the following format:

    arn:aws:iam::account-id:role/aws-service-role/scraper.aps.amazonaws.com/AWSServiceRoleForAmazonPrometheusScraper_unique-id

    Amazon EKS requires a different format for this ARN. You must adjust the format of the returned ARN to be used in the next step. Edit it to match this format:

    arn:aws:iam::account-id:role/AWSServiceRoleForAmazonPrometheusScraper_unique-id

    For example, this ARN:

    arn:aws:iam::111122223333:role/aws-service-role/scraper.aps.amazonaws.com/AWSServiceRoleForAmazonPrometheusScraper_1234abcd-56ef-7

    Must be rewritten as:

    arn:aws:iam::111122223333:role/AWSServiceRoleForAmazonPrometheusScraper_1234abcd-56ef-7
  5. Run the following command in your cluster, using the modified roleArn from the previous step, as well as your cluster name and region.:

    eksctl create iamidentitymapping --cluster cluster-name --region region-id --arn roleArn --username aps-collector-user

    This allows the scraper to access the cluster using the role and user you created in the clusterrole-binding.yml file.

Find and delete scrapers

You can use the AWS API or the AWS CLI to list the scrapers in your account or to delete them.

Note

Make sure that you are using the latest version of the AWS CLI or SDK. The latest version provides you with the latest features and functionality, as well as security updates. Alternatively, use AWS Cloudshell, which provides an always up-to-date command line experience, automatically.

To list all the scrapers in your account, use the ListScrapers API operation.

Alternatively, with the AWS CLI, call:

aws amp list-scrapers

ListScrapers returns all of the scrapers in your account, for example:

{ "scrapers": [ { "scraperId": "s-1234abcd-56ef-7890-abcd-1234ef567890", "arn": "arn:aws:aps:us-west-2:123456789012:scraper/s-1234abcd-56ef-7890-abcd-1234ef567890", "roleArn": "arn:aws:iam::123456789012:role/aws-service-role/AWSServiceRoleForAmazonPrometheusScraper_1234abcd-2931", "status": { "statusCode": "DELETING" }, "createdAt": "2023-10-12T15:22:19.014000-07:00", "lastModifiedAt": "2023-10-12T15:55:43.487000-07:00", "tags": {}, "source": { "eksConfiguration": { "clusterArn": "arn:aws:eks:us-west-2:123456789012:cluster/my-cluster", "securityGroupIds": [ "sg-1234abcd5678ef90" ], "subnetIds": [ "subnet-abcd1234ef567890", "subnet-1234abcd5678ab90" ] } }, "destination": { "ampConfiguration": { "workspaceArn": "arn:aws:aps:us-west-2:123456789012:workspace/ws-1234abcd-5678-ef90-ab12-cdef3456a78" } } } ] }

To delete a scraper, find the scraperId for the scraper that you want to delete, using the ListScrapers operation, and then use the DeleteScraper operation to delete it.

Alternatively, with the AWS CLI, call:

aws amp delete-scraper --scraper-id scraperId

Scraper configuration

You can control how your scraper discovers and collects metrics with a Prometheus-compatible scraper configuration. For example, you can change the interval that metrics are sent to the workspace. You can also use relabeling to dynamically rewrite the labels of a metric. The scraper configuration is a YAML file that is part of the definition of the scraper.

When a new scraper is created, you specify a configuration by providing a base64 encoded YAML file in the API call. You can download a general purpose configuration file with the GetDefaultScraperConfiguration operation in the Amazon Managed Service for Prometheus API.

To modify the configuration of a scraper, delete the scraper and recreate it with the new configuration.

Supported configuration

For information about the scraper configuration format, including a detailed breakdown of the possible values, see Configuration in the Prometheus documentation. The global configuration options, and <scrape_config> options describe the most commonly needed options.

Because Amazon EKS is the only supported service, the only service discovery config (<*_sd_config>) supported is the <kubernetes_sd_config>.

The complete list of config sections allowed:

  • <global>

  • <scrape_config>

  • <static_config>

  • <relabel_config>

  • <metric_relabel_configs>

  • <kubernetes_sd_config>

Limitations within these sections are listed after the sample configuration file.

Sample configuration file

The following is a sample YAML configuration file with a 30 second scrape interval.

global: scrape_interval: 30s external_labels: clusterArn: apiserver-test-2 scrape_configs: - job_name: pod_exporter kubernetes_sd_configs: - role: pod - job_name: cadvisor scheme: https authorization: type: Bearer credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token kubernetes_sd_configs: - role: node relabel_configs: - action: labelmap regex: __meta_kubernetes_node_label_(.+) - replacement: kubernetes.default.svc:443 target_label: __address__ - source_labels: [__meta_kubernetes_node_name] regex: (.+) target_label: __metrics_path__ replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor # apiserver metrics - scheme: https authorization: type: Bearer credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token job_name: kubernetes-apiservers kubernetes_sd_configs: - role: endpoints relabel_configs: - action: keep regex: default;kubernetes;https source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_service_name - __meta_kubernetes_endpoint_port_name # kube proxy metrics - job_name: kube-proxy honor_labels: true kubernetes_sd_configs: - role: pod relabel_configs: - action: keep source_labels: - __meta_kubernetes_namespace - __meta_kubernetes_pod_name separator: '/' regex: 'kube-system/kube-proxy.+' - source_labels: - __address__ action: replace target_label: __address__ regex: (.+?)(\\:\\d+)? replacement: $1:10249

The following are limitations specific to AWS managed collectors:

  • Scrape interval – The scraper config can't specify a scrape interval of less than 30 seconds.

  • Targets – Targets in the static_config must be specified as IP addresses.

  • DNS resolution – Related to the target name, the only server name that is recognized in this config is the Kubernetes api server, kubernetes.default.svc. All other machines names must be specified by IP address.

  • Authorization – Omit if no authorization is needed. If it is needed, the authorization must be Bearer, and must point to the file /var/run/secrets/kubernetes.io/serviceaccount/token. In other words, if used, the authorization section must look like the following:

    authorization: type: Bearer credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    Note

    type: Bearer is the default, so can be omitted.

Troubleshooting scraper configuration

Amazon Managed Service for Prometheus collectors automatically discover and scrape metrics. But how can you troubleshoot when you don't see a metric you expect to see in your Amazon Managed Service for Prometheus workspace?

The up metric is a helpful tool. For each endpoint that an Amazon Managed Service for Prometheus collector discovers, it automatically vends this metric. There are three states of this metric that can help you to troubleshoot what is happening within the collector.

  • up is not present – If there is no up metric present for an endpoint, then that means that the collector was not able to find the endpoint.

    If you are sure that the endpoint exists, you likely need to adjust the scrape configuration. The discovery relabel_config might need to be adjusted, or it's possible that there is a problem with the role used for discovery.

  • up is present, but is always 0 – If up is present, but 0, then the collector is able to discover the endpoint, but can't find any Prometheus-compatible metrics.

    In this case, you might try using a curl command against the endpoint directly. You can validate that you have the details correct, for example, the protocol (http or https), the endpoint, or port that you are using. You can also check that the endpoint is responding with a valid 200 response, and follows the Prometheus format. Finally, the body of the response can't be larger than the maximum allowed size. (For limits on AWS managed collectors, see the following section.)

  • up is present and greater than 0 – If up is present, and is greater than 0, then metrics are being sent to Amazon Managed Service for Prometheus.

    Validate that you are looking for the correct metrics in Amazon Managed Service for Prometheus (or your alternate dashboard, such as Amazon Managed Grafana). You can use curl again to check for expected data in your /metrics endpoint. Also check that you haven't exceeded other limits, such as the number of endpoints per scraper. You can check the number of metrics endpoints being scraped by checking the count of up metrics, using count(up).

Scraper limitations

There are few limitations to the fully managed scrapers provided by Amazon Managed Service for Prometheus.

  • Region – Your EKS cluster, managed scraper, and Amazon Managed Service for Prometheus workspace must all be in the same AWS Region.

  • Account – Your EKS cluster, managed scraper, and Amazon Managed Service for Prometheus workspace must all be in the same AWS account.

  • Collectors – You can have a maximum of 10 Amazon Managed Service for Prometheus scrapers per region per account.

    Note

    You can request an increase to this limit by requesting a quota increase.

  • Metrics response – The body of a response from any one /metrics endpoint request cannot be more than 50 megabytes (MB).

  • Endpoints per scraper – A scraper can scrape a maximum of 30,000 /metrics endpoints.

  • Scrape interval – The scraper config can't specify a scrape interval of less than 30 seconds.