Configure CloudWatch agent for Amazon EMR 7.1.0 - Amazon EMR

Configure CloudWatch agent for Amazon EMR 7.1.0

Starting with Amazon EMR 7.1.0, you can configure the Amazon CloudWatch agent to use additional system metrics, add application metrics, and change metrics destination by using the Amazon EMR configuration API. For more information about how to use the EMR configuration API to configure your cluster’s applications, see Configure applications.

Note

7.1.0 only supports the reconfiguration type OVERWRITE. For more information about the reconfiguration types, see Considerations when you reconfigure an instance group.

Configuration schema

emr-metrics has the following classifications:

  • emr-system-metrics — configure system metrics, such as CPU, disk, and memory.

  • emr-hadoop-hdfs-datanode-metrics — configure Hadoop DataNode JMX metrics

  • emr-hadoop-hdfs-namenode-metrics — configure Hadoop NameNode JMX metrics

  • emr-hadoop-yarn-nodemanager-metrics — configure Yarn NodeManager JMX metrics

  • emr-hadoop-yarn-resourcemanager-metrics — configure Yarn ResourceManager JMX metrics

  • emr-hbase-master-metrics — configure HBase Master JMX metrics

  • emr-hbase-region-server-metrics — configure HBase Region Server JMX metrics

  • emr-hbase-rest-server-metrics — configure HBase REST Server JMX metrics

  • emr-hbase-thrift-server-metrics — configure HBase Thrift Server JMX metrics

The following tables describe the available properties and configurations for all of the classifications.

emr-metrics properties

Property Required Description Default value Possible values Notes
metrics_destination Optional Determines whether cluster metrics are published to Amazon CloudWatch or Amazon Managed Service for Prometheus. "CLOUDWATCH" "CLOUDWATCH", "PROMETHEUS" This property is case-insensitive. For example, "Cloudwatch" is the same as "CLOUDWATCH".
prometheus_endpoint Optional If metrics_destination is set to "PROMETHEUS", this property configures the CloudWatch agent to send metrics to the provided Amazon Managed Service for Prometheus remote write endpoint. N/A Any valid Amazon Managed Service for Prometheus remote write URL. The remote write URL format is
https://aps-workspaces.<region>.amazonaws.com/workspaces/<workspace_id>/api/v1/remote_write
This field is required if metrics_destination is set to "PROMETHEUS". Provisioning will fail if you don't provide a key or if the value is an empty string.

emr-system-metrics properties

Property Required Description Default value Possible values Notes
metrics_collection_interval Optional How often in seconds metrics are collected and published from the CloudWatch agent. "60" A string specifying the number of seconds. Only accepts whole numbers. You can override this property with the metrics_collection_interval property from individual metric groups.

emr-system-metrics configurations

cpu
Property Required Description Default value Possible values Notes
metrics Optional The list of CPU metrics for the agent to collect. See Default metrics for CloudWatch agent with Amazon EMR A comma-separated list of valid CPU metric names with or without the cpu_ prefix, such as usage_active and cpu_time_idle. See Metrics collected by the CloudWatch agent for valid metrics. Specifying an empty string means to not publish any CPU metrics.
metrics_collection_interval Optional How often in seconds the agent should collect and publish CPU metrics. The value of the global metrics_collection_interval. A string specifying the number of seconds. Accepts only whole numbers. This value overrides the global metrics_collection_interval property only for CPU metrics.
drop_original_metrics Optional List of CPU metrics for which to not publish unaggregated metrics. No unaggregated CPU metrics published. A comma-separated list of CPU metrics that are also specified in the metrics property. An empty string means to publish all CPU metrics. The CloudWatch agent aggregates all metrics by cluster ID, instance ID, node type, and service name. By default, the CloudWatch agent doesn't publish the per-resource metrics for metrics with multiple resources.
resources Optional Determines whether the agent will publish per-core metrics. "*" "*" enable per-core metrics. "" disable per-core metrics. The CloudWatch agent only publishes per-core metrics for CPU metrics that aren't dropped in drop_original_metrics.
disk
Property Required Description Default value Possible values Notes
metrics Optional The list of disk metrics for the agent to collect. See Default metrics for CloudWatch agent with Amazon EMR A comma-separated list of valid disk metric names with or without the disk_ prefix, such as disk_total and used_percent. See Metrics collected by the CloudWatch agent for valid metrics. Specifying an empty string means to not publish any disk metrics.
metrics_collection_interval Optional How often in seconds the agent should collect and publish disk metrics. The value of the global metrics_collection_interval. A string specifying the number of seconds. Accepts only whole numbers. This value overrides the global metrics_collection_interval property only for disk metrics.
drop_original_metrics Optional List of disk metrics for which to not publish unaggregated metrics. No unaggregated disk metrics published. A comma-separated list of disk metrics that are also specified in the metrics property. An empty string means to publish all disk metrics. The CloudWatch agent aggregates all metrics by cluster ID, instance ID, node type, and service name. By default, the CloudWatch agent doesn't publish the per-resource metrics for metrics with multiple resources.
resources Optional Determines whether the agent will publish per-mount-point metrics. "*" "*" means all mount points, "" means no mount points, or a comma-separated list of mount points. For example, "/,/emr". The CloudWatch agent only publishes per-mount-point metrics for disk metrics that aren't dropped in drop_original_metrics.
diskio
Property Required Description Default value Possible values Notes
metrics Optional The list of disk IO metrics for the agent to collect. See Default metrics for CloudWatch agent with Amazon EMR A comma-separated list of valid disk IO metric names with or without the diskio_ prefix, such as diskio_reads and writes. See Metrics collected by the CloudWatch agent for valid metrics. Specifying an empty string means to not publish any disk IO metrics.
metrics_collection_interval Optional How often in seconds the agent should collect and publish disk IO metrics. The value of the global metrics_collection_interval. A string specifying the number of seconds. Accepts only whole numbers. This value overrides the global metrics_collection_interval property only for disk IO metrics.
drop_original_metrics Optional List of disk IO metrics for which to not publish unaggregated metrics. No unaggregated disk IO metrics published. A comma-separated list of disk IO metrics that are also specified in the metrics property. An empty string means to publish all disk IO metrics. The CloudWatch agent aggregates all metrics by cluster ID, instance ID, node type, and service name. By default, the CloudWatch agent doesn't publish the per-resource metrics for metrics with multiple resources.
resources Optional Determines whether the agent will publish per-device metrics. "*" "*" means all storage devices, "" means no storage devices, or a comma-separated list of device names. For example, "nvme0n1,nvme1n1". The CloudWatch agent only publishes per-device metrics for disk IO metrics that aren't dropped in drop_original_metrics.
mem
Property Required Description Default value Possible values Notes
metrics Optional The list of memory metrics for the agent to collect. See Default metrics for CloudWatch agent with Amazon EMR A comma-separated list of valid memory metric names with or without the mem_ prefix, such as mem_available and available_percent. See Metrics collected by the CloudWatch agent for valid metrics. Specifying an empty string means to not publish any memory metrics.
metrics_collection_interval Optional How often in seconds the agent should collect and publish memory metrics. The value of the global metrics_collection_interval. A string specifying the number of seconds. Accepts only whole numbers. This value overrides the global metrics_collection_interval property only for memory metrics.
net
Property Required Description Default value Possible values Notes
metrics Optional The list of network metrics for the agent to collect. See Default metrics for CloudWatch agent with Amazon EMR A comma-separated list of valid network metric names with or without the net_ prefix, such as net_packets_sent and packets_recv. See Metrics collected by the CloudWatch agent for valid metrics. Specifying an empty string means to not publish any network metrics.
metrics_collection_interval Optional How often in seconds the agent should collect and publish network metrics. The value of the global metrics_collection_interval. A string specifying the number of seconds. Accepts only whole numbers. This value overrides the global metrics_collection_interval property only for network metrics.
drop_original_metrics Optional List of network metrics for which to not publish unaggregated metrics. No unaggregated network metrics published. A comma-separated list of network metrics that are also specified in the metrics property. An empty string means to publish all network metrics. The CloudWatch agent aggregates all metrics by cluster ID, instance ID, node type, and service name. By default, the CloudWatch agent doesn't publish the per-resource metrics for metrics with multiple resources.
resources Optional Determines whether the agent will publish per-interface metrics. "*" "*" means all network interfaces, "" means no network interfaces, or a comma-separated list of interfaces names. For example, "eth0,eth1". The CloudWatch agent only publishes per-interface metrics for network metrics that aren't dropped in drop_original_metrics.
netstat
Property Required Description Default value Possible values Notes
metrics Optional The list of network statistics metrics for the agent to collect. See Default metrics for CloudWatch agent with Amazon EMR A comma-separated list of valid memory metric names with or without the netstat_ prefix, such as tcp_listen and netstat_udp_socket. See Metrics collected by the CloudWatch agent for valid metrics. Specifying an empty string means to not publish any network statistic metrics.
metrics_collection_interval Optional How often in seconds the agent should collect and publish network statistic metrics. The value of the global metrics_collection_interval. A string specifying the number of seconds. Accepts only whole numbers. This value overrides the global metrics_collection_interval property only for network statistic metrics.
processes
Property Required Description Default value Possible values Notes
metrics Optional The list of process metrics for the agent to collect. See Default metrics for CloudWatch agent with Amazon EMR A comma-separated list of valid memory metric names with or without the processes_ prefix, such as processes_running and total. See Metrics collected by the CloudWatch agent for valid metrics. Specifying an empty string means to not publish any process metrics.
metrics_collection_interval Optional How often in seconds the agent should collect and publish system process metrics. The value of the global metrics_collection_interval. A string specifying the number of seconds. Accepts only whole numbers. This value overrides the global metrics_collection_interval property only for system process metrics.
swap
Property Required Description Default value Possible values Notes
metrics Optional The list of swap metrics for the agent to collect. See Default metrics for CloudWatch agent with Amazon EMR A comma-separated list of valid memory metric names with or without the swap_ prefix, such as swap_free and used_percent. See Metrics collected by the CloudWatch agent for valid metrics. Specifying an empty string means to not publish any swap metrics.
metrics_collection_interval Optional How often in seconds the agent should collect and publish swap metrics. The value of the global metrics_collection_interval. A string specifying the number of seconds. Accepts only whole numbers. This value overrides the global metrics_collection_interval property only for swap metrics.

emr-hadoop-hdfs-datanode-metrics properties

Property Required Description Default value Possible values
<custom_bean_name> Optional N/A The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=DataNode,name=DataNodeActivity. You can find sample MBean names and their corresponding metrics in the example JMX YAML files for Amazon EMR release 7.0. A string containing the comma-delimited list of metrics that are associated with the MBean. For example, BlocksCached,BlocksRead.
otel.metric.export.interval Optional How often in milliseconds to collect Hadoop DataNode metrics. "60000" A string specifying the number of milliseconds. Accepts only whole numbers.

emr-hadoop-hdfs-namenode-metrics properties

Property Required Description Default value Possible values
<custom_bean_name> Optional N/A The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=NameNode,name=FSNamesystem. You can find sample MBean names and their corresponding metrics in the example JMX YAML files for Amazon EMR release 7.0. A string containing the comma-delimited list of metrics that are associated with the MBean. For example, BlockCapacity,CapacityUsedGB.
otel.metric.export.interval Optional How often in milliseconds to collect Hadoop NameNode metrics. "60000" A string specifying the number of milliseconds. Accepts only whole numbers.

emr-hadoop-yarn-nodemanager-metrics properties

Property Required Description Default value Possible values
<custom_bean_name> Optional N/A The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=NodeManager,name=NodeManagerMetrics. You can find sample MBean names and their corresponding metrics in the example JMX YAML files for Amazon EMR release 7.0. A string containing the comma-delimited list of metrics that are associated with the MBean. For example, MaxCapacity,AllocatedGB.
otel.metric.export.interval Optional How often in milliseconds to collect Hadoop YARN NodeManager metrics. "60000" A string specifying the number of milliseconds. Accepts only whole numbers.

emr-hadoop-yarn-resourcemanager-metrics properties

Property Required Description Default value Possible values
<custom_bean_name> Optional N/A The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=ResourceManager,name=PartitionQueueMetrics. You can find sample MBean names and their corresponding metrics in the example JMX YAML files for Amazon EMR release 7.0. A string containing the comma-delimited list of metrics that are associated with the MBean. For example, MaxCapacity,MaxCapacityVCores.
otel.metric.export.interval Optional How often in milliseconds to collect Hadoop YARN ResourceManager metrics. "60000" A string specifying the number of milliseconds. Accepts only whole numbers.

emr-hbase-master-metrics properties

Property Required Description Default value Possible values
<custom_bean_name> Optional N/A The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=HBase,name=Master,sub=AssignmentManager. You can find sample MBean names and their corresponding metrics in the example JMX YAML files for Amazon EMR release 7.0. A string containing the comma-delimited list of metrics that are associated with the MBean. For example, AssignFailedCount,AssignSubmittedCount.
otel.metric.export.interval Optional How often in milliseconds to collect HBase Master metrics. "60000" A string specifying the number of milliseconds. Accepts only whole numbers.

emr-hbase-region-server-metrics properties

Property Required Description Default value Possible values
<custom_bean_name> Optional N/A The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=HBase,name=RegionServer,sub=IPC. You can find sample MBean names and their corresponding metrics in the example JMX YAML files for Amazon EMR release 7.0. A string containing the comma-delimited list of metrics that are associated with the MBean. For example, numActiveHandler,numActivePriorityHandler.
otel.metric.export.interval Optional How often in milliseconds to collect HBase Region Server metrics. "60000" A string specifying the number of milliseconds. Accepts only whole numbers.

emr-hbase-rest-server-metrics properties

Property Required Description Default value Possible values
<custom_bean_name> Optional N/A The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=HBase,name=REST. You can find sample MBean names and their corresponding metrics in the example JMX YAML files for Amazon EMR release 7.0. A string containing the comma-delimited list of metrics that are associated with the MBean. For example, successfulPut,successfulScanCount.
otel.metric.export.interval Optional How often in milliseconds to collect HBase Rest Server metrics. "60000" A string specifying the number of milliseconds. Accepts only whole numbers.

emr-hbase-thrift-server-metrics properties

Property Required Description Default value Possible values
<custom_bean_name> Optional N/A The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=HBase,name=Thrift,sub=ThriftOne. You can find sample MBean names and their corresponding metrics in the example JMX YAML files for Amazon EMR release 7.0. A string containing the comma-delimited list of metrics that are associated with the MBean. For example, BatchGet_max,BatchGet_mean.
otel.metric.export.interval Optional How often in milliseconds to collect HBase Thrift server metrics. "60000" A string specifying the number of milliseconds. Accepts only whole numbers.

System metrics configurations examples

The following example demonstrates how to configure the CloudWatch agent to stop exporting all system metrics.

[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [ { "Classification": "emr-system-metrics", "Properties": {}, "Configurations": [] } ] } ]

The following example configures the CloudWatch agent to export the default system metrics. Doing so is a quick way to reset the agent back to only exporting the default system metrics if you've already reconfigured the system metrics at least once. This reset also removes any application metrics that were reconfigured before.

[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [] } ]

The following example configures the cluster to export the cpu, mem, and the disk metrics.

[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [ { "Classification": "emr-system-metrics", "Properties": { "metrics_collection_interval": "20" }, "Configurations": [ { "Classification": "cpu", "Properties": { "metrics": "cpu_usage_guest,cpu_usage_idle", "metrics_collection_interval": "30", "drop_original_metrics": "cpu_usage_guest" } }, { "Classification": "mem", "Properties": { "metrics": "mem_active" } }, { "Classification": "disk", "Properties": { "metrics": "disk_used_percent", "resources": "/,/mnt", "drop_original_metrics": "" } } ] } ] } ]

The previous example configuration has the following properties:

  • Every 30 seconds, the agent collects the cpu_guest metric for all CPUs. You can find the aggregated metric under the CloudWatch namespace CWAgent > cluster.id, instance.id, node.type, service.name.

  • Every 30 seconds, the agent collects the cpu_idle metric for all CPUs. You can find the aggregated metric under the CloudWatch namespace CWAgent > cluster.id, instance.id, node.type, service.name. The agent also collects the per-cpu metrics. You can find them in the same namespace. The agent collects this metric because the drop_original_metrics property doesn't contain cpu_idle, so the agent doesn't ignore the metric.

  • Every 20 seconds, the agent collects the mem_active metric. You can find the aggregated metric under the CloudWatch namespace CWAgent > cluster.id, instance.id, node.type, service.name.

  • Every 20 seconds, the agent collects the disk_used_percent metrics for the / and /mnt disk mounts. You can find the aggregated metrics under the CloudWatch namespace CWAgent > cluster.id, instance.id, node.type, service.name. The agent also collects the per-mount metrics. You can find them in the same namespace. The agent collects this metric because the drop_original_metrics property doesn't contain disk_used_percent, so the agent doesn't ignore the metric.

Application metrics configurations examples

The following example configures the CloudWatch agent to stop exporting metrics for the Hadoop Namenode service.

[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [ { "Classification": "emr-hadoop-hdfs-namenode-metrics", "Properties": {}, "Configurations": [] } ] } ]

The following example configures a cluster to export Hadoop application metrics.

[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [ { "Classification": "emr-hadoop-hdfs-namenode-metrics", "Properties": { "Hadoop:service=NameNode,name=FSNamesystem": "BlockCapacity,CapacityUsedGB", "otel.metric.export.interval": "20000" }, "Configurations": [] }, { "Classification": "emr-hadoop-hdfs-datanode-metrics", "Properties": { "Hadoop:service=DataNode,name=JvmMetrics": "MemNonHeapUsedM", "otel.metric.export.interval": "30000" }, "Configurations": [] }, { "Classification": "emr-hadoop-yarn-resourcemanager-metrics", "Properties": { "Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics": "AllocateNumOps,NodeUpdateNumOps" }, "Configurations": [] } ] } ]

The previous example has the following properties:

  • Every 20 seconds, the agent collects the BlockCapacity and CapacityUsedGB metrics from instances running the Hadoop Namenode service.

  • Every 30 seconds, the agent collects MemNonHeapUsedM metrics from instances running the Hadoop Datanode service.

  • Every 30 seconds, the agent collects the AllocateNumOps and NodeUpdateNumOps metrics from instances that run the Hadoop YARN ResourceManaager.

Amazon Managed Service for Prometheus example

The following example demonstrates how to configure the CloudWatch agent to export metrics to Amazon Managed Service for Prometheus.

If you are currently exporting metrics to Amazon Managed Service for Prometheus and want to reconfigure the metrics for the cluster and continue exporting metrics to Amazon Managed Service for Prometheus, you must include the properties metrics_destination and prometheus_endpoint.

[ { "Classification": "emr-metrics", "Properties": { "metrics_destination": "prometheus", "prometheus_endpoint": "http://amp-workspace/api/v1/remote_write" }, "Configurations": [] } ]

To use the CloudWatch agent to export metrics to CloudWatch, use the following example.

[ { "Classification": "emr-metrics", "Properties": { "metrics_destination": "cloudwatch" }, "Configurations": [] } ]
Note

The CloudWatch agent has a Prometheus exporter that renames certain attributes. For the default metrics labels, Amazon Managed Service for Prometheus uses underscore characters in place of the periods that Amazon CloudWatch uses. If you use Amazon Managed Grafana to visualize the default metrics in Amazon Managed Service for Prometheus, the labels appear as cluster_id, instance_id, node_type, and service_name.