Configure CloudWatch agent for Amazon EMR 7.1.0
Starting with Amazon EMR 7.1.0, you can configure the Amazon CloudWatch agent to use additional system metrics, add application metrics, and change metrics destination by using the Amazon EMR configuration API. For more information about how to use the EMR configuration API to configure your cluster’s applications, see Configure applications.
Note
7.1.0 only supports the reconfiguration type OVERWRITE
. For more information about the
reconfiguration types, see Considerations when you reconfigure an instance group.
Topics
Configuration schema
emr-metrics
has the following classifications:
-
emr-system-metrics
— configure system metrics, such as CPU, disk, and memory. -
emr-hadoop-hdfs-datanode-metrics
— configure Hadoop DataNode JMX metrics -
emr-hadoop-hdfs-namenode-metrics
— configure Hadoop NameNode JMX metrics -
emr-hadoop-yarn-nodemanager-metrics
— configure Yarn NodeManager JMX metrics -
emr-hadoop-yarn-resourcemanager-metrics
— configure Yarn ResourceManager JMX metrics -
emr-hbase-master-metrics
— configure HBase Master JMX metrics -
emr-hbase-region-server-metrics
— configure HBase Region Server JMX metrics -
emr-hbase-rest-server-metrics
— configure HBase REST Server JMX metrics -
emr-hbase-thrift-server-metrics
— configure HBase Thrift Server JMX metrics
The following tables describe the available properties and configurations for all of the classifications.
emr-metrics properties
Property | Required | Description | Default value | Possible values | Notes |
---|---|---|---|---|---|
metrics_destination |
Optional | Determines whether cluster metrics are published to Amazon CloudWatch or Amazon Managed Service for Prometheus. | "CLOUDWATCH" | "CLOUDWATCH", "PROMETHEUS" | This property is case-insensitive. For example, "Cloudwatch" is the same as "CLOUDWATCH". |
prometheus_endpoint |
Optional | If metrics_destination is set to "PROMETHEUS", this property configures the
CloudWatch agent to send metrics to the provided Amazon Managed Service for Prometheus remote write endpoint. |
N/A | Any valid Amazon Managed Service for Prometheus remote write URL.
The remote write URL format is
|
This field is required if metrics_destination is set to "PROMETHEUS".
Provisioning will fail if you don't provide a key or if the value is an empty string. |
emr-system-metrics properties
Property | Required | Description | Default value | Possible values | Notes |
---|---|---|---|---|---|
metrics_collection_interval |
Optional | How often in seconds metrics are collected and published from the CloudWatch agent. | "60" | A string specifying the number of seconds. Only accepts whole numbers. | You can override this property with the metrics_collection_interval
property from individual metric groups. |
emr-system-metrics configurations
emr-hadoop-hdfs-datanode-metrics properties
Property | Required | Description | Default value | Possible values |
---|---|---|---|---|
|
Optional | N/A | The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=DataNode,name=DataNodeActivity . You can find sample MBean names and
their corresponding metrics in the example JMX YAML files |
A string containing the comma-delimited list of metrics that are associated with the MBean. For example, BlocksCached,BlocksRead . |
otel.metric.export.interval |
Optional | How often in milliseconds to collect Hadoop DataNode metrics. | "60000" | A string specifying the number of milliseconds. Accepts only whole numbers. |
emr-hadoop-hdfs-namenode-metrics properties
Property | Required | Description | Default value | Possible values |
---|---|---|---|---|
|
Optional | N/A | The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=NameNode,name=FSNamesystem . You can find sample MBean names and
their corresponding metrics in the example JMX YAML files |
A string containing the comma-delimited list of metrics that are associated with the MBean. For example, BlockCapacity,CapacityUsedGB . |
otel.metric.export.interval |
Optional | How often in milliseconds to collect Hadoop NameNode metrics. | "60000" | A string specifying the number of milliseconds. Accepts only whole numbers. |
emr-hadoop-yarn-nodemanager-metrics properties
Property | Required | Description | Default value | Possible values |
---|---|---|---|---|
|
Optional | N/A | The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=NodeManager,name=NodeManagerMetrics . You can find sample MBean names and
their corresponding metrics in the example JMX YAML files |
A string containing the comma-delimited list of metrics that are associated with the MBean. For example, MaxCapacity,AllocatedGB . |
otel.metric.export.interval |
Optional | How often in milliseconds to collect Hadoop YARN NodeManager metrics. | "60000" | A string specifying the number of milliseconds. Accepts only whole numbers. |
emr-hadoop-yarn-resourcemanager-metrics properties
Property | Required | Description | Default value | Possible values |
---|---|---|---|---|
|
Optional | N/A | The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=ResourceManager,name=PartitionQueueMetrics . You can find sample MBean names and
their corresponding metrics in the example JMX YAML files |
A string containing the comma-delimited list of metrics that are associated with the MBean. For example, MaxCapacity,MaxCapacityVCores . |
otel.metric.export.interval |
Optional | How often in milliseconds to collect Hadoop YARN ResourceManager metrics. | "60000" | A string specifying the number of milliseconds. Accepts only whole numbers. |
emr-hbase-master-metrics properties
Property | Required | Description | Default value | Possible values |
---|---|---|---|---|
|
Optional | N/A | The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=HBase,name=Master,sub=AssignmentManager . You can find sample MBean names and
their corresponding metrics in the example JMX YAML files |
A string containing the comma-delimited list of metrics that are associated with the MBean. For example, AssignFailedCount,AssignSubmittedCount . |
otel.metric.export.interval |
Optional | How often in milliseconds to collect HBase Master metrics. | "60000" | A string specifying the number of milliseconds. Accepts only whole numbers. |
emr-hbase-region-server-metrics properties
Property | Required | Description | Default value | Possible values |
---|---|---|---|---|
|
Optional | N/A | The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=HBase,name=RegionServer,sub=IPC . You can find sample MBean names and
their corresponding metrics in the example JMX YAML files |
A string containing the comma-delimited list of metrics that are associated with the MBean. For example, numActiveHandler,numActivePriorityHandler . |
otel.metric.export.interval |
Optional | How often in milliseconds to collect HBase Region Server metrics. | "60000" | A string specifying the number of milliseconds. Accepts only whole numbers. |
emr-hbase-rest-server-metrics properties
Property | Required | Description | Default value | Possible values |
---|---|---|---|---|
|
Optional | N/A | The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=HBase,name=REST . You can find sample MBean names and
their corresponding metrics in the example JMX YAML files |
A string containing the comma-delimited list of metrics that are associated with the MBean. For example, successfulPut,successfulScanCount . |
otel.metric.export.interval |
Optional | How often in milliseconds to collect HBase Rest Server metrics. | "60000" | A string specifying the number of milliseconds. Accepts only whole numbers. |
emr-hbase-thrift-server-metrics properties
Property | Required | Description | Default value | Possible values |
---|---|---|---|---|
|
Optional | N/A | The MBean that CloudWatch agent should collect metrics from, such as Hadoop:service=HBase,name=Thrift,sub=ThriftOne . You can find sample MBean names and
their corresponding metrics in the example JMX YAML files |
A string containing the comma-delimited list of metrics that are associated with the MBean. For example, BatchGet_max,BatchGet_mean . |
otel.metric.export.interval |
Optional | How often in milliseconds to collect HBase Thrift server metrics. | "60000" | A string specifying the number of milliseconds. Accepts only whole numbers. |
System metrics configurations examples
The following example demonstrates how to configure the CloudWatch agent to stop exporting all system metrics.
[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [ { "Classification": "emr-system-metrics", "Properties": {}, "Configurations": [] } ] } ]
The following example configures the CloudWatch agent to export the default system metrics. Doing so is a quick way to reset the agent back to only exporting the default system metrics if you've already reconfigured the system metrics at least once. This reset also removes any application metrics that were reconfigured before.
[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [] } ]
The following example configures the cluster to export the cpu
, mem
, and the disk
metrics.
[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [ { "Classification": "emr-system-metrics", "Properties": { "metrics_collection_interval": "20" }, "Configurations": [ { "Classification": "cpu", "Properties": { "metrics": "cpu_usage_guest,cpu_usage_idle", "metrics_collection_interval": "30", "drop_original_metrics": "cpu_usage_guest" } }, { "Classification": "mem", "Properties": { "metrics": "mem_active" } }, { "Classification": "disk", "Properties": { "metrics": "disk_used_percent", "resources": "/,/mnt", "drop_original_metrics": "" } } ] } ] } ]
The previous example configuration has the following properties:
-
Every 30 seconds, the agent collects the
cpu_guest
metric for all CPUs. You can find the aggregated metric under the CloudWatch namespaceCWAgent > cluster.id, instance.id, node.type, service.name
. -
Every 30 seconds, the agent collects the
cpu_idle
metric for all CPUs. You can find the aggregated metric under the CloudWatch namespaceCWAgent > cluster.id, instance.id, node.type, service.name
. The agent also collects the per-cpu metrics. You can find them in the same namespace. The agent collects this metric because thedrop_original_metrics
property doesn't containcpu_idle
, so the agent doesn't ignore the metric. -
Every 20 seconds, the agent collects the
mem_active
metric. You can find the aggregated metric under the CloudWatch namespaceCWAgent > cluster.id, instance.id, node.type, service.name
. -
Every 20 seconds, the agent collects the
disk_used_percent
metrics for the/
and/mnt
disk mounts. You can find the aggregated metrics under the CloudWatch namespaceCWAgent > cluster.id, instance.id, node.type, service.name
. The agent also collects the per-mount metrics. You can find them in the same namespace. The agent collects this metric because thedrop_original_metrics
property doesn't containdisk_used_percent
, so the agent doesn't ignore the metric.
Application metrics configurations examples
The following example configures the CloudWatch agent to stop exporting metrics for the Hadoop Namenode service.
[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [ { "Classification": "emr-hadoop-hdfs-namenode-metrics", "Properties": {}, "Configurations": [] } ] } ]
The following example configures a cluster to export Hadoop application metrics.
[ { "Classification": "emr-metrics", "Properties": {}, "Configurations": [ { "Classification": "emr-hadoop-hdfs-namenode-metrics", "Properties": { "Hadoop:service=NameNode,name=FSNamesystem": "BlockCapacity,CapacityUsedGB", "otel.metric.export.interval": "20000" }, "Configurations": [] }, { "Classification": "emr-hadoop-hdfs-datanode-metrics", "Properties": { "Hadoop:service=DataNode,name=JvmMetrics": "MemNonHeapUsedM", "otel.metric.export.interval": "30000" }, "Configurations": [] }, { "Classification": "emr-hadoop-yarn-resourcemanager-metrics", "Properties": { "Hadoop:service=ResourceManager,name=CapacitySchedulerMetrics": "AllocateNumOps,NodeUpdateNumOps" }, "Configurations": [] } ] } ]
The previous example has the following properties:
-
Every 20 seconds, the agent collects the
BlockCapacity
andCapacityUsedGB
metrics from instances running the Hadoop Namenode service. -
Every 30 seconds, the agent collects
MemNonHeapUsedM
metrics from instances running the Hadoop Datanode service. -
Every 30 seconds, the agent collects the
AllocateNumOps
andNodeUpdateNumOps
metrics from instances that run the Hadoop YARN ResourceManaager.
Amazon Managed Service for Prometheus example
The following example demonstrates how to configure the CloudWatch agent to export metrics to Amazon Managed Service for Prometheus.
If you are currently exporting metrics to Amazon Managed Service for Prometheus and want to reconfigure the metrics for the cluster and continue exporting
metrics to Amazon Managed Service for Prometheus, you must include the properties metrics_destination
and prometheus_endpoint
.
[ { "Classification": "emr-metrics", "Properties": { "metrics_destination": "prometheus", "prometheus_endpoint": "http://amp-workspace/api/v1/remote_write" }, "Configurations": [] } ]
To use the CloudWatch agent to export metrics to CloudWatch, use the following example.
[ { "Classification": "emr-metrics", "Properties": { "metrics_destination": "cloudwatch" }, "Configurations": [] } ]
Note
The CloudWatch agent has a Prometheus exporter that renames certain attributes.
For the default metrics labels, Amazon Managed Service for Prometheus uses underscore characters in place of
the periods that Amazon CloudWatch uses. If you use Amazon Managed Grafana to visualize the default
metrics in Amazon Managed Service for Prometheus, the labels appear as cluster_id
, instance_id
, node_type
, and service_name
.