Install the CloudWatch agent by using the Amazon CloudWatch Observability EKS add-on - Amazon CloudWatch

Install the CloudWatch agent by using the Amazon CloudWatch Observability EKS add-on

The Amazon CloudWatch Observability EKS add-on installs the CloudWatch Agent and the Fluent-bit agent on an Amazon EKS cluster, with Container Insights enhanced observability for Amazon EKS and CloudWatch Application Signals enabled by default. Using the add-on, you can collect infrastructure metrics, application performance telemetry, and container logs from the Amazon EKS cluster.

With Container Insights with enhanced observability for Amazon EKS, Container Insights metrics are charged per observation instead of being charged per metric stored or log ingested. For Application Signals, billing is based on inbound requests to your applications, outbound requests from your applications, and each configured service level objective (SLO). Each inbound request received generates one application signal, and each outbound request made generates one application signal. Every SLO creates two application signals per measurement period. For more information about CloudWatch pricing, see Amazon CloudWatch Pricing.

The Amazon EKS add-on enables Container Insights on both Linux and Windows worker nodes in the Amazon EKS cluster. To enable Container Insights on Windows, you must use version 1.5.0 or later of the Amazon EKS add-on. Currently, Application Signals is not supported on Windows in Amazon EKS clusters.

The Amazon CloudWatch Observability EKS add-on is supported on Amazon EKS clusters running with Kubernetes version 1.23 or later.

When you install the add-on, you must also grant IAM permissions to enable the CloudWatch agent to send metrics, logs, and traces to CloudWatch. There are two ways to do this:

  • Attach a policy to the IAM role of your worker nodes. This option grants permissions to worker nodes to send telemetry to CloudWatch.

  • Use an IAM role for service accounts for the agent pods, and attach the policy to this role. This works only for Amazon EKS clusters. This option gives CloudWatch access only to the appropriate agent pods.

Option 1: Install with IAM permissions on worker nodes

To use this method, first attach the CloudWatchAgentServerPolicy IAM policy to your worker nodes by entering the following command. In this command, replace my-worker-node-role with the IAM role used by your Kubernetes worker nodes.

aws iam attach-role-policy \ --role-name my-worker-node-role \ --policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy

Then install the Amazon CloudWatch Observability EKS add-on. To install the add-on, you can use the AWS CLI, the console, AWS CloudFormation, or Terraform.

AWS CLI
To use the AWS CLI to install the Amazon CloudWatch Observability EKS add-on

Enter the following command. Replace my-cluster-name with the name of your cluster.

aws eks create-addon --addon-name amazon-cloudwatch-observability --cluster-name my-cluster-name
Amazon EKS console
To use the Amazon EKS console to add the Amazon CloudWatch Observability EKS add-on
  1. Open the Amazon EKS console at https://console.aws.amazon.com/eks/home#/clusters.

  2. In the left navigation pane, choose Clusters.

  3. Choose the name of the cluster that you want to configure the Amazon CloudWatch Observability EKS add-on for.

  4. Choose the Add-ons tab.

  5. Choose Get more add-ons.

  6. On the Select add-ons page, do the following:

    1. In the Amazon EKS-addons section, select the Amazon CloudWatch Observability check box.

    2. Choose Next.

  7. On the Configure selected add-ons settings page, do the following:

    1. Select the Version you'd like to use.

    2. For Select IAM role, select Inherit from node

    3. (Optional) You can expand the Optional configuration settings. If you select Override for the Conflict resolution method, one or more of the settings for the existing add-on can be overwritten with the Amazon EKS add-on settings. If you don't enable this option and there's a conflict with your existing settings, the operation fails. You can use the resulting error message to troubleshoot the conflict. Before selecting this option, make sure that the Amazon EKS add-on doesn't manage settings that you need to self-manage.

    4. Choose Next.

  8. On the Review and add page, choose Create. After the add-on installation is complete, you see your installed add-on.

AWS CloudFormation
To use AWS CloudFormation to install the Amazon CloudWatch Observability EKS add-on

Replace my-cluster-name with the name of your cluster. For more information, see AWS::EKS::Addon.

{ "Resources": { "EKSAddOn": { "Type": "AWS::EKS::Addon", "Properties": { "AddonName": "amazon-cloudwatch-observability", "ClusterName": "my-cluster-name" } } } }
Terraform
To use Terraform to install the Amazon CloudWatch Observability EKS add-on

Replace my-cluster-name with the name of your cluster. For more information, see Resource: aws_eks_addon.

resource "aws_eks_addon" "example" { addon_name = "amazon-cloudwatch-observability" cluster_name = "my-cluster-name" }

Option 2: Install using IAM service account role

Before using this method, verify the following prerequisites:

  • You have a functional Amazon EKS cluster with nodes attached in one of the AWS Regions that supports Container Insights. For the list of supported Regions, see Container Insights.

  • You have kubectl installed and configured for the cluster. For more information, see Installing kubectl in the Amazon EKS User Guide.

  • You have eksctl installed. For more information, see Installing or updating eksctl in the Amazon EKS User Guide.

To install the Amazon CloudWatch Observability EKS add-on using the IAM service account role
  1. Enter the following command to create an OpenID Connect (OIDC) provider, if the cluster doesn't have one already. For more information, see Configuring a Kubernetes service account to assume an IAM role in the Amazon EKS User Guide.

    eksctl utils associate-iam-oidc-provider --cluster my-cluster-name --approve
  2. Enter the following command to create the IAM role with the CloudWatchAgentServerPolicy policy attached, and configure the agent service account to assume that role using OIDC. Replace my-cluster-name with the name of your cluster, and replace my-service-account-role with the name of the role that you want to associate the service account with. If the role doesn't already exist, eksctl creates it for you.

    eksctl create iamserviceaccount \ --name cloudwatch-agent \ --namespace amazon-cloudwatch --cluster my-cluster-name \ --role-name my-service-account-role \ --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy \ --role-only \ --approve
  3. Install the add-on by entering the following command. Replace my-cluster-name with the name of your cluster, replace 111122223333 with your account ID, and replace my-service-account-role with the IAM role created in the previous step.

    aws eks create-addon --addon-name amazon-cloudwatch-observability --cluster-name my-cluster-name --service-account-role-arn arn:aws:iam::111122223333:role/my-service-account-role

(Optional) Additional configuration

Opt out of collecting container logs

By default, the add-on uses Fluent Bit to collect container logs from all pods and then sends the logs to CloudWatch Logs. For information about which logs are collected, see Setting up Fluent Bit.

To opt out of the collection of container logs, pass the following option when you create or update the add-on:

--configuration-values '{ "containerLogs": { "enabled": false } }'

Opt out of NVIDIA GPU metric collection

Beginning with version 1.300034.0 of the CloudWatch agent, Container Insights collects NVIDIA GPU metrics from EKS workloads by default. These metrics are listed in the table in NVIDIA GPU metrics.

You can opt out of collecting NVIDIA GPU metrics by setting the accelerated_compute_metrics option in the CloudWatch agent configuration file to false. This option is in the kubernetes section of the metrics_collected section in the CloudWatch configuration file. The following is an example of an opt-out configuration.

{ "agent": { "region": "us-east-1" }, "logs": { "metrics_collected": { "emf": { }, "kubernetes": { "enhanced_container_insights": true, "accelerated_compute_metrics": false } }, "force_flush_interval": 5, } }

Use a custom CloudWatch agent configuration

To collect other metrics, logs or traces using the CloudWatch agent, you can specify a custom configuration while also keeping Container Insights and CloudWatch Application Signals enabled. To do so, embed the CloudWatch agent configuration file within the config key under the agent key of the advanced configuration that you can use when creating or updating the EKS add-on. The following represents the default agent configuration when you do not provide any additional configuration.

Important

Any custom configuration that you provide using additional configuration settings overrides the default configuration used by the agent. Be cautious not to unintentionally disable functionality that is enabled by default, such as Container Insights with enhanced observability and CloudWatch Application Signals. In the scenario that you are required to provide a custom agent configuration, we recommend using the following default configuration as a baseline and then modifying it accordingly.

--configuration-values '{ "agent": { "config": { "logs": { "metrics_collected": { "app_signals": {}, "kubernetes": { "enhanced_container_insights": true } } }, "traces": { "traces_collected": { "app_signals": {} } } } }'

The following example shows the default agent configuration for the CloudWatch agent on Windows. The CloudWatch agent on Windows does not support custom configuration.

{ "logs": { "metrics_collected": { "kubernetes": { "enhanced_container_insights": true }, } } }

Manage admission webhook TLS certificates

The Amazon CloudWatch Observability EKS add-on leverages Kubernetes admission webhooks to validate and mutate AmazonCloudWatchAgent and Instrumentation custom resource (CR) requests, and optionally Kubernetes pod requests on the cluster if CloudWatch Application Signals is enabled. In Kubernetes, webhooks require a TLS certificate that the API server is configured to trust in order to ensure secure communication.

By default, the Amazon CloudWatch Observability EKS add-on auto-generates a self-signed CA and a TLS certificate signed by this CA for securing the communication between the API server and the webhook server. This auto-generated certificate has a default expiry of 10 years and is not auto-renewed upon expiry. In addition, the CA bundle and the certificate are re-generated every time the add-on is upgraded or re-installed, thus resetting the expiry. If you want to change the default expiry of the auto-generated certificate, you can use the following additional configurations when creating or updating the add-on. Replace expiry-in-days with your desired expiry duration in days.

--configuration-values '{ "admissionWebhooks": { "autoGenerateCert": { "expiryDays": expiry-in-days } } }'

For a more secure and feature-rich certificate authority solution, the add-on has opt-in support for cert-manager, a widely-adopted solution for TLS certificate management in Kubernetes that simplifies the process of obtaining, renewing, managing and using those certificates. It ensures that certificates are valid and up to date, and attempts to renew certificates at a configured time before expiry. cert-manager also facilitates issuing certificates from a variety of supported sources, including AWS Certificate Manager Private Certificate Authority.

We recommend that you review best practices for management of TLS certificates on your clusters and advise you to opt in to cert-manager for production environments. Note that if you opt-in to enabling cert-manager for managing the admission webhook TLS certificates, you are required to pre-install cert-manager on your Amazon EKS cluster before you install the Amazon CloudWatch Observability EKS add-on. Refer to cert-manager documentation to learn more about available installation options. After you install it, you can opt in to using cert-manager for managing the admission webhook TLS certificates using the following additional configuration when creating or updating the add-on.

--configuration-values '{ "admissionWebhooks": { "certManager": { "enabled": true } } }'

The advanced configuration discussed in this section will by default use a SelfSigned issuer.

Collecting Amazon EBS volume IDs

If you want to collect Amazon EBS volume IDs in the performance logs, you must add another policy to the IAM role that is attached to the worker nodes or to the service account. Add the following as an inline policy. For more information, see Adding and Removing IAM Identity Permissions.

{ "Version": "2012-10-17", "Statement": [ { "Action": [ "ec2:DescribeVolumes" ], "Resource": "*", "Effect": "Allow" } ] }

Troubleshooting the Amazon CloudWatch Observability EKS add-on

Use the following information to help troubleshoot problems with the Amazon CloudWatch Observability EKS add-on.

Updating and deleting the Amazon CloudWatch Observability EKS add-on

For instructions about updating or deleting the Amazon CloudWatch Observability EKS add-on, see Managing Amazon EKS add-ons. Use amazon-cloudwatch-observability as the name of the add-on.

Verify version of the CloudWatch agent used by the Amazon CloudWatch Observability EKS add-on

The Amazon CloudWatch Observability EKS add-on installs a custom resource of kind AmazonCloudWatchAgent that controls the behavior of the CloudWatch agent daemonset on the cluster, including the version of the CloudWatch agent being used. You can get a list of all the AmazonCloudWatchAgent custom resources installed on your cluster u by entering the following command:

kubectl get amazoncloudwatchagent -A

In the output of this command, you should be able to check the version of the CloudWatch agent. Alternatively, you can also describe the amazoncloudwatchagent resource or one of the cloudwatch-agent-* pods running on your cluster to inspect the image being used.

Handling a ConfigurationConflict when managing the add-on

When you install or update the Amazon CloudWatch Observability EKS add-on, if you notice a failure caused by a Health Issue of type ConfigurationConflict with a description that starts with Conflicts found when trying to apply. Will not continue due to resolve conflicts mode, it is likely because you already have the CloudWatch agent and its associated components such as the ServiceAccount, the ClusterRole and the ClusterRoleBinding installed on the cluster. When the add-on tries to install the CloudWatch agent and its associated components, if it detects any change in the contents, it by default fails the installation or update to avoid overwriting the state of the resources on the cluster.

If you are trying to onboard to the Amazon CloudWatch Observability EKS add-on and you see this failure, we recommend deleting an existing CloudWatch agent setup that you had previously installed on the cluster and then installing the EKS add-on. Be sure to back up any customizations you might have made to the original CloudWatch agent setup such as a custom agent configuration, and provide these to the Amazon CloudWatch Ob servability EKS add-on when you next install or update it. If you had previously installed the CloudWatch agent for onboarding to Container Insights, see Deleting the CloudWatch agent and Fluent Bit for Container Insights for more information.

Alternatively, the add-on supports a conflict resolution configuration option that has the capability to specify OVERWRITE. You can use this option to proceed with installing or updating the add-on by overwriting the conflicts on the cluster. If you are using the Amazon EKS console, you'll find the Conflict resolution method when you choose the Optional configuration settings when you create or update the add-on. If you are using the AWS CLI, you can supply the --resolve-conflicts OVERWRITE to your command to create or update the add-on.