Amazon CloudWatch
Developer Guide (API Version 2010-08-01)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Amazon CloudWatch Metrics, Namespaces, and Dimensions Reference

AWS Namespaces

All AWS services that provide Amazon CloudWatch data use a namespace string, beginning with "AWS/". The following services push metric data points to CloudWatch.

AWS ProductNamespace

AWS Billing

AWS/Billing

Amazon DynamoDB

AWS/DynamoDB

Amazon ElastiCache

AWS/ElastiCache

Amazon Elastic Block Store

AWS/EBS

Amazon Elastic Compute Cloud

AWS/EC2

Amazon Elastic MapReduce

AWS/ElasticMapReduce

Amazon Relational Database

AWS/RDS

Amazon Simple Notification Service

AWS/SNS

Amazon Simple Queue Service

AWS/SQS

Amazon Storage Gateway

AWS/StorageGateway

Auto Scaling

AWS/AutoScaling

Elastic Load Balancing

AWS/ELB

AWS Billing Dimensions and Metrics

MetricDescription

EstimatedCharges

The estimated charges for your AWS usage. This can either be estimated charges for one service or a roll-up of estimated charges for all services.

Dimensions for AWS Billing Metrics

AWS Billing sends the ServiceName and LinkedAccount dimensions to Amazon CloudWatch.

DimensionDescription

ServiceName

The name of the AWS service. This dimension is omitted for the total of estimated charges across all services.

LinkedAccount

The linked account number. This is used for consolidated billing only. This dimension is omitted for the total of all accounts.

Currency

The monetary currency to bill the account. This dimension is required.

Unit: USD

Amazon DynamoDB Dimensions and Metrics

Amazon DynamoDB Dimensions and Metrics

The following metrics are available from the Amazon DynamoDB Service. The service only sends metrics when they have a non-zero value. For example, if no requests generating a 400 status code occur in a time period, you would see no data for the UserErrors metric that reports requests generating a 400 status code.

Note

The Statistic values available through Amazon CloudWatch, such as Average or Sum, are not always applicable to every metric. However, they are all available through the console, API, and command line client for all services. For each metric, be aware of the list of Valid Statistics for the Amazon DynamoDB metrics to track useful information. For example, Amazon CloudWatch can monitor each time an Amazon DynamoDB request is refused (the ThrottledRequests metric). It marks that event as one occurrence. If the request is retried and also refused, Amazon CloudWatch marks the second event as one occurrence, too. The Sum statistic is now 2. But, the Average statistic for the ThrottledRequests metric is simply 1, if a request is throttled in the specified time period, once or repeatedly. For the ThrottledRequests metric, use the listed Valid Statistics (either Sum or SampleCount) to see the trend of ThrottledRequests over a specified time period.

MetricDescription
SuccessfulRequestLatency

The number of successful requests in the specified time period. By default, SuccessfulRequestLatency provides the elapsed time for successful calls. You can see statistics for the Minimum, Maximum, or Average, over time.

Note

Cloudwatch also provides a SampleCount statistic: the total number of successful calls for a sample time period.

View (namespace): AWS/DynamoDB, TableName, Operation

Units: Milliseconds (or a count for SampleCount)

Valid Statistics: Minimum, Maximum, Average, SampleCount

UserErrors

The number of requests generating a 400 status code (likely indicating a client error) response in the specified time period.

View (namespace): All Metrics

Units: Count

Valid Statistics: Sum, SampleCount

SystemErrors

The number of requests generating a 500 status code (likely indicating a server error) response in the specified time period.

View (namespace): AWS/DynamoDB, TableName

Units: Count

Valid Statistics: Sum, SampleCount

ThrottledRequests

The number of user requests that exceeded the preset provisioned throughput limits in the specified time period.

View (namespace): AWS/DynamoDB, TableName

Units: Count

Valid Statistics: Sum, SampleCount

ProvisionedReadCapacityUnits

The amount of read capacity units provisioned for the table. For more information, see Provisioned Throughput in Amazon DynamoDB.

View (namespace): AWS/DynamoDB, TableName

Units: Count

Valid Statistics: Minimum, Maximum, Average, Sum

ProvisionedWriteCapacityUnits

The amount of write capacity units provisioned for the table. For more information, see Provisioned Throughput in Amazon DynamoDB.

View (namespace): AWS/DynamoDB, TableName

Units: Count

Valid Statistics: Minimum, Maximum, Average, Sum

ConsumedReadCapacityUnits

The amount of read capacity units consumed over the specified time period, so you can track how much of your provisioned throughput is used. For more information, see Provisioned Throughput in Amazon DynamoDB.

View (namespace): AWS/DynamoDB, TableName

Note

Use the Sum value to calculate the provisioned throughput. For example, get the Sum value over a span of 5 minutes. Divide the Sum value by the number of seconds in 5 minutes (300) to get an average for the ConsumedReadCapacityUnits per second. You can compare the calculated value to the provisioned throughput value you provide Amazon DynamoDB.

View (namespace): AWS/DynamoDB, TableName

Units: Count

Valid Statistics: Minimum, Maximum, Average, Sum

ConsumedWriteCapacityUnits

The amount of write capacity units consumed over the specified time period, so you can track how much of your provisioned throughput is used. For more information, see Provisioned Throughput in Amazon DynamoDB.

Note

Use the Sum value to calculate the provisioned throughput. For example, get the Sum value over a span of 5 minutes. Divide the Sum value by the number of seconds in 5 minutes (300) to get an average for the ConsumedWriteCapacityUnits per second. You can compare the calculated value to the provisioned throughput value you provide Amazon DynamoDB.

View (namespace): AWS/DynamoDB, TableName

Units: Count

Valid Statistics: Minimum, Maximum, Average, Sum

ReturnedItemCount

The number of items returned by a Scan or Query operation.

View (namespace): AWS/DynamoDB, TableName

Units: Count

Valid Statistics: Minimum, Maximum, Average, SampleCount, Sum

Dimensions for Amazon DynamoDB Metrics

The metrics for Amazon DynamoDB are qualified by the values for the account, table name, or operation. Account level metrics display when you select AWS/DynamoDB as the viewing option. Otherwise, Amazon DynamoDB data can be retrieved along any of the following dimensions in the table below. Some metrics allow you to specify both a table name and operation, depending on the viewing option you specify.

Dimension

Description

TableName

This dimension limits the data you request to a specific table. This value can be any table name for the current account.

Operation

The operation corresponds to the Amazon DynamoDB service API, and can be one of the following:

  • PutItem

  • DeleteItem

  • UpdateItem

  • GetItem

  • BatchGetItem

  • Scan

  • Query

For all of the operations in the current Amazon DynamoDB service API, see Operations in Amazon DynamoDB.

Amazon ElastiCache Dimensions and Metrics

Metric Dimensions

All ElastiCache metrics use the "AWS/ElastiCache" namespace and provide metrics for a single dimension, the CacheNodeId, which is the automatically-generated identifier for each cache node in the cache cluster. You can find out what these values are for your cache nodes using the DescribeCacheClusters API or elasticache-describe-cache-clusters command line utility.

Each metric is published under a single set of dimensions. When retrieving metrics, you must supply both the CacheClusterId and CacheNodeId dimensions.

Available Metrics

ElastiCache provides both host-level metrics (for example, CPU usage) and Memcached-specific metrics (i.e. number of gets). These metrics are measured and published for each Cache node in 60-second intervals.

The following table lists Memcached-specific metrics provided by Amazon ElastiCache at the cache node level.

Metric Description Unit

CPUUtilization

The percentage of CPU utilization.

Percent

SwapUsage

The amount of swap used on the host.

Bytes

FreeableMemory

The amount of free memory available on the host.

Bytes

NetworkBytesIn

The number of bytes the host has read from the network.

Bytes

NetworkBytesOut

The number of bytes the host has written to the network.

Bytes

The following table lists the cache node-level metrics provided by Amazon ElastiCache that are derived from the Memcached stats command.

Note

For complete documentation of the Memcached stats command, go to https://github.com/memcached/memcached/blob/master/doc/protocol.txt.

Metric Description Unit

BytesUsedForCacheItems

The number of bytes used to store cache items.

Bytes

BytesReadIntoMemcached

The number of bytes that have been read from the network by the cache node.

Bytes

BytesWrittenOutFromMemcached

The number of bytes that have been written to the network by the cache node.

Bytes

CasBadval

The number of CAS (check and set) requests the cache has received where the Cas value did not match the Cas value stored.

Count

CasHits

The number of Cas requests the cache has received where the requested key was found and the Cas value matched.

Count

CasMisses

The number of Cas requests the cache has received where the key requested was not found.

Count

CmdFlush

The number of flush commands the cache has received.

Count

CmdGet

The number of get commands the cache has received.

Count

CmdSet

The number of set commands the cache has received.

Count

CurrConnections

A count of the number of connections connected to the cache at an instant in time. Note that due to the design of Memcached, this will always return a minimum count of 10.

Count

CurrItems

A count of the number of items currently stored in the cache.

Count

DecrHits

The number of decrement requests the cache has received where the requested key was found.

Count

DecrMisses

The number of decrement requests the cache has received where the requested key was not found.

Count

DeleteHits

The number of delete requests the cache has received where the requested key was found.

Count

DeleteMisses

The number of delete requests the cache has received where the requested key was not found.

Count

Evictions

The number of non-expired items the cache evicted to allow space for new writes.

Count

GetHits

The number of get requests the cache has received where the key requested was found.

Count

GetMisses

The number of get requests the cache has received where the key requested was not found.

Count

IncrHits

The number of increment requests the cache has received where the key requested was found.

Count

IncrMisses

The number of increment requests the cache has received where the key requested was not found.

Count

Reclaimed

The number of expired items the cache evicted to allow space for new writes.

Count

For Memcached 1.4.14, the following additional metrics are provided.

Metric Description Unit

BytesUsedForHash

The number of bytes currently used by hash tables.

Bytes

CmdConfigGet

The cumulative number of "config get" requests.

Count

CmdConfigSet

The cumulative number of "config set" requests.

Count

CmdTouch

The cumulative number of "touch" requests.

Count

CurrConfig

The current number of configurations stored.

Count

EvictedUnfetched

The number of valid items evicted from the least recently used cache (LRU) which were never touched after being set.

Count

ExpiredUnfetched

The number of expired items reclaimed from the LRU which were never touched after being set.

Count

SlabsMoved

The total number of slab pages that have been moved.

Count

TouchHits

The number of keys that have been touched and were given a new expiration time.

Count

TouchMisses

The number of items that have been touched, but were not found.

Count

The following table describes the available calculated cache level metrics.

Metric Description Unit

NewConnections

The number of new connections the cache has received. This is derived from the memcached total_connections statistic by recording the change in total_connections across a period of time. This will always be at least 1, due to a connection reserved for a ElastiCache.

Count

NewItems

The number of new items the cache has stored. This is derived from the memcached total_items statistic by recording the change in total_items across a period of time.

Count

UnusedMemory

The amount of unused memory the cache can use to store items. This is derived from the memcached statistics limit_maxbytes and bytes by subtracting bytes from limit_maxbytes.

Bytes

Amazon EBS Dimensions and Metrics

Amazon Elastic Block Store sends data points to Amazon CloudWatch for several metrics. Standard mounted Amazon EBS volumes automatically send five-minute metrics to Amazon CloudWatch. Provisioned IOPS volumes automatically send one-minute metrics to Amazon CloudWatch.

Amazon EBS Metrics

You can use the Amazon CloudWatch GetMetricStatistics API to get any of the Amazon EBS volume metrics listed in the following table. Similar metrics are grouped together in the table, and the metrics in the first two rows are also available for the local stores on Amazon EC2 instances.

MetricDescription

VolumeReadBytes

VolumeWriteBytes

The total number of bytes transferred in the period.

Units: Bytes

VolumeReadOps

VolumeWriteOps

The total number of operations in the period.

Units: Count

VolumeTotalReadTime

VolumeTotalWriteTime

The total number of seconds spent by all operations that completed in the period. If multiple requests are submitted at the same time, this total could be greater than the length of the period. For example, say the period is 5 minutes (300 seconds); if 700 operations completed during that period, and each operation took 1 second, the value would be 700 seconds.

Units: Seconds

VolumeIdleTime

The total number of seconds in the period when no read or write operations were submitted.

Units: Seconds

VolumeQueueLength

The number of read and write operation requests waiting to be completed in the period.

Units: Count

VolumeThroughputPercentage

Used with Provisioned IOPS volumes only. The percentage of I/O operations per second (IOPS) delivered out of the IOPS provisioned for an EBS volume. Provisioned IOPS volumes deliver within 10 percent of the provisioned IOPS performance 99.9 percent of the time over a given year.

Note

During a write, if there are no other pending I/O requests in a minute, the metric value will be 100 percent. Also, a volume's I/O performance may become degraded temporarily due to an action you have taken (e.g., creating a snapshot of a volume during peak usage, running the volume on a non-EBS-optimized instance, accessing data on the volume for the first time).

Units: Percent

VolumeConsumedReadWriteOps

Used with Provisioned IOPS volumes only. The total amount of read and write operations consumed in the period.

Units: Count

Dimensions for Amazon EBS Metrics

The only dimension that Amazon EBS sends to Amazon CloudWatch is the Volume ID. This means that all available statistics are filtered by Volume ID.

Amazon Elastic Compute Cloud Dimensions and Metrics

This section discusses the metrics and dimensions that Amazon Elastic Compute Cloud (Amazon EC2) sends to Amazon CloudWatch, and describes how to enable detailed (one-minute) monitoring for an EC2 instance. Amazon CloudWatch offers basic (five-minute) monitoring for Amazon EC2 by default. To access detailed monitoring of Amazon EC2 instances, you must enable it.

Amazon EC2 Metrics

The following metrics are available from each EC2 instance.

MetricDescription

CPUUtilization

The percentage of allocated EC2 compute units that are currently in use on the instance. This metric identifies the processing power required to run an application upon a selected instance.

Units: Percent

DiskReadOps

Completed read operations from all ephemeral disks available to the instance (if your instance uses Amazon EBS, see Amazon EBS Metrics.)

This metric identifies the rate at which an application reads a disk. This can be used to determine the speed in which an application reads data from a hard disk.

Units: Count

DiskWriteOps

Completed write operations to all ephemeral disks available to the instance (if your instance uses Amazon EBS, see Amazon EBS Metrics.)

This metric identifies the rate at which an application writes to a hard disk. This can be used to determine the speed in which an application saves data to a hard disk.

Units: Count

DiskReadBytes

Bytes read from all ephemeral disks available to the instance (if your instance uses Amazon EBS, see Amazon EBS Metrics.)

This metric is used to determine the volume of the data the application reads from the hard disk of the instance. This can be used to determine the speed of the application.

Units: Bytes

DiskWriteBytes

Bytes written to all ephemeral disks available to the instance (if your instance uses Amazon EBS, see Amazon EBS Metrics.)

This metric is used to determine the volume of the data the application writes onto the hard disk of the instance. This can be used to determine the speed of the application.

Units: Bytes

NetworkIn

The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to an application on a single instance.

Units: Bytes

NetworkOut

The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic to an application on a single instance.

Units: Bytes

StatusCheckFailed

A combination of StatusCheckFailed_Instance and StatusCheckFailed_System that reports if either of the status checks has failed. Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status check passed. A one indicates a status check failure.

Note

Status check metrics are available at 5 minute frequency and are not available in Detailed Monitoring. For a newly launched instance, status check metric data will only be available after the instance has completed the initialization state. Status check metrics will become available within a few minutes of being in the running state.

Units: Count

StatusCheckFailed_Instance

Reports whether the instance has passed the EC2 instance status check in the last 5 minutes. Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status check passed. A one indicates a status check failure.

Note

Status check metrics are available at 5 minute frequency and are not available in Detailed Monitoring. For a newly launched instance, status check metric data will only be available after the instance has completed the initialization state. Status check metrics will become available within a few minutes of being in the running state.

Units: Count

StatusCheckFailed_System

Reports whether the instance has passed the EC2 system status check in the last 5 minutes. Values for this metric are either 0 (zero) or 1 (one.) A zero indicates that the status check passed. A one indicates a status check failure.

Note

Status check metrics are available at 5 minute frequency and are not available in Detailed Monitoring. For a newly launched instance, status check metric data will only be available after the instance has completed the initialization state. Status check metrics will become available within a few minutes of being in the running state.

Units: Count

Amazon CloudWatch data for a new EC2 instance typically becomes available within one minute of the end of the first period of time requested (the aggregation period) in the query. You can set the period—the length of time over which statistics are aggregated—with the Period parameter. For more information on periods, see Periods.

You can use the currently available dimensions for EC2 instances (for example, ImageID or InstanceType) to refine the metrics returned. For information about the dimensions you can use with EC2, see Dimensions for Amazon EC2 Metrics.

Dimensions for Amazon EC2 Metrics

If you're using Detailed Monitoring, you can filter the EC2 instance data using any of the dimensions in the following table.

Dimension

Description

AutoScalingGroupName

This dimension filters the data you request for all instances in a specified capacity group. An AutoScalingGroup is a collection of instances you define if you're using the Auto Scaling service. This dimension is available only for EC2 metrics when the instances are in such an AutoScalingGroup. Available for instances with Detailed or Basic Monitoring enabled.

ImageId

This dimension filters the data you request for all instances running this EC2 Amazon Machine Image (AMI). Available for instances with Detailed Monitoring enabled.

InstanceId

This dimension filters the data you request for the identified instance only. This helps you pinpoint an exact instance from which to monitor data. Available for instances with Detailed Monitoring enabled.

InstanceType

This dimension filters the data you request for all instances running with this specified instance type. This helps you categorize your data by the type of instance running. For example, you might compare data from an m1.small instance and an m1.large instance to determine which has the better business value for your application. Available for instances with Detailed Monitoring enabled.

Activating Detailed Monitoring for Amazon EC2

The following procedure walks through the steps to enable detailed metric collection when launching an EC2 instance. For more information about launching an Amazon EC2 instance, see Launching an Instance from an AMI.

To activate detailed metrics through the console

  1. Sign in to the AWS Management Console and open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

  2. If necessary, change the region. From the navigation bar, select the region that meets your needs. For more information, see Regions and Endpoints.

    region selector on the navigation bar
  3. Click Launch Instance.

    Launch instance start
  4. On the Create a New Instance page, click Classic Wizard, and then click Continue.

  5. On the CHOOSE AN AMI page, select an from the list.

  6. On the INSTANCE DETAILS page, configure the settings as appropriate for your AMI.

  7. On the second INSTANCE DETAILS page, click the Enable CloudWatch monitoring for this instance check box, set any other settings as appropriate, and then click Continue.

    Launch instance start
  8. Continue through the remaining steps of the Request Instances Wizard. On the REVIEW page, click Launch.

The instance you launched has detailed monitoring enabled.

Amazon Elastic MapReduce Dimensions and Metrics

This section discusses the metrics and dimensions that Amazon Elastic MapReduce (Amazon EMR) sends to Amazon CloudWatch. All Amazon EMR job flows automatically send metrics in five-minute intervals. Metrics are archived for two weeks; after that period, the data is discarded.

Amazon EMR Metrics

Amazon EMR sends the following metrics to Amazon CloudWatch.

Note

Amazon EMR pulls metrics from a cluster. If a cluster becomes unreachable, no metrics will be reported until the cluster becomes available again.

MetricDescription

CoreNodesPending

The number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests. Data points for this metric are reported only when a corresponding instance group exists.

Use Case: Monitor cluster health

Units: Count

CoreNodesRunning

The number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists.

Use Case: Monitor cluster health

Units: Count

HBaseBackupFailed

Whether the last backup failed. This is set to 0 by default and updated to 1 if the previous backup attempt failed. This metric is only reported for HBase clusters.

Use Case: Monitor HBase backups

Units: Count

HBaseMostRecentBackupDuration

The amount of time it took the previous backup to complete. This metric is set regardless of whether the last comppleted backup succeeded or failed. While the backup is ongoing, this metric returns the number of minutes since the backup started. This metric is only reported for HBase clusters.

Use Case: Monitor HBase Backups

Units: Minutes

HBaseTimeSinceLastSuccessfulBackup

The number of elapsed minutes since the last successful HBase backup started on your cluster. This metric is only reported for HBase clusters.

Use Case: Monitor HBase backups

Units: Minutes

HDFSBytesRead

The number of bytes read from HDFS.

Use Case: Analyze cluster performance, Monitor cluster progress

Units: Count

HDFSBytesWritten

The number of bytes written to HDFS.

Use Case: Analyze cluster performance, Monitor cluster progress

Units: Count

HDFSUtilization

The percentage of HDFS storage currently used.

Use Case: Analyze cluster performance

Units: Percent

IsIdle

Indicates that a cluster is no longer performing work, but is still alive and accruing charges. It is set to 1 if no tasks are running and no jobs are running, and set to 0 otherwise. This value is checked at five-minute intervals and a value of 1 indicates only that the cluster was idle when checked, not that it was idle for the entire five minutes. To avoid false positives, you should raise an alarm when this value has been 1 for more than one consecutive 5-minute check. For example, you might raise an alarm on this value if it has been 1 for thirty minutes or longer.

Use Case: Monitor cluster performance

Units: Count

JobsFailed

The number of jobs in the cluster that have failed.

Use Case: Monitor cluster health

Units: Count

JobsRunning

The number of jobs in the cluster that are currently running.

Use Case: Monitor cluster health

Units: Count

LiveDataNodes

The percentage of data nodes that are receiving work from Hadoop.

Use Case: Monitor cluster health

Units: Percent

LiveTaskTrackers

The percentage of task trackers that are functional.

Use Case: Monitor cluster health

Units: Percent

MapSlotsOpen

The unused map task capacity. This is calculated as the maximum number of map tasks for a given cluster, less the total number of map tasks currently running in that cluster.

Use Case: Analyze cluster performance

Units: Count

MissingBlocks

The number of blocks in which HDFS has no replicas. These might be corrupt blocks.

Use Case: Monitor cluster health

Units: Count

ReduceSlotsOpen

Unused reduce task capacity. This is calculated as the maximum reduce task capacity for a given cluster, less the number of reduce tasks currently running in that cluster.

Use Case: Analyze cluster performance

Units: Count

RemainingMapTasks

The number of remaining map tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. A remaining map task is one that is not in any of the following states: Running, Killed, or Completed.

Use Case: Monitor cluster progress

Units: Count

RemainingMapTasksPerSlot

The ratio of the total map tasks remaining to the total map slots available in the cluster.

Use Case: Analyze cluster performance

Units: Ratio

RemainingReduceTasks

The number of remaining reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated.

Use Case: Monitor cluster progress

Units: Count

RunningMapTasks

The number of running map tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs will be generated.

Use Case: Monitor cluster progress

Units: Count

RunningReduceTasks

The number of running reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated.

Use Case: Monitor cluster progress

Units: Count

S3BytesRead

The number of bytes read from Amazon S3.

Use Case: Analyze cluster performance, Monitor cluster progress

Units: Count

S3BytesWritten

The number of bytes written to Amazon S3.

Use Case: Analyze cluster performance, Monitor cluster progress

Units: Count

TaskNodesPending

The number of core nodes waiting to be assigned. All of the task nodes requested may not be immediately available; this metric reports the pending requests. Data points for this metric are reported only when a corresponding instance group exists.

Use Case: Monitor cluster health

Units: Count

TaskNodesRunning

The number of task nodes working. Data points for this metric are reported only when a corresponding instance group exists.

Use Case: Monitor cluster health

Units: Count

TotalLoad

The total number of concurrent data transfers.

Use Case: Monitor cluster health

Units: Count

Amazon EMR Metrics

The following dimensions are available for Amazon EMR.

Dimension Description
JobFlowId The identifier for a cluster. You can find this value by clicking on the cluster in the Amazon EMR console. It takes the form j-XXXXXXXXXXXXX.
JobId The identifier of a job within a cluster. You can use this to filter the metrics returned from a cluster down to those that apply to a single job within the cluster. JobId takes the form job_XXXXXXXXXXXX_XXXX.

Amazon RDS Dimensions and Metrics

This section discusses the metrics and dimensions that Amazon Relational Database Service sends to Amazon CloudWatch. Amazon CloudWatch provides detailed monitoring of Amazon RDS by default. Amazon Relational Database Service sends metrics for each active database instance every minute. Unlike Amazon EC2 and Auto Scaling, you do not need to specifically enable detailed monitoring.

Amazon RDS Metrics

The following metrics are available from Amazon Relational Database Service.

MetricDescription
BinLogDiskUsage

The amount of disk space occupied by binary logs on the master.

Units: Bytes

CPUUtilization

The percentage of CPU utilization.

Units: Percent

DatabaseConnections

The number of database connections in use.

Units: Count

DiskQueueDepth

The number of outstanding IOs (read/write requests) waiting to access the disk.

Units: Count

FreeableMemory

The amount of available random access memory.

Units: Bytes

FreeStorageSpace

The amount of available storage space.

Units: Bytes

ReplicaLag

The amount of time a Read Replica DB Instance lags behind the source DB Instance.

Units: Seconds

SwapUsage

The amount of swap space used on the DB Instance.

Units: Bytes

ReadIOPS

The average number of disk I/O operations per second.

Units: Count/Second

WriteIOPS

The average number of disk I/O operations per second.

Units: Count/Second

ReadLatency

The average amount of time taken per disk I/O operation.

Units: Seconds

WriteLatency

The average amount of time taken per disk I/O operation.

Units: Seconds

ReadThroughput

The average number of bytes read from disk per second.

Units: Bytes/Second

WriteThroughput

The average number of bytes written to disk per second.

Units: Bytes/Second

Dimensions for RDS Metrics

Amazon RDS data can be filtered along any of the following dimensions in the table below.

Dimension

Description

DBInstanceIdentifier

This dimension filters the data you request for a specific database instance.

DatabaseClass

This dimension filters the data you request for all instances in a database class. For example, you can aggregate metrics for all instances that belong to the database class db.m1.small

EngineName

This dimension filters the data you request for the identified engine name only. For example, you can aggregate metrics for all instances that have the engine name mysql.

Amazon SNS Dimensions and Metrics

Amazon SNS sends data points to Amazon CloudWatch for several metrics. All active topics automatically send five-minute metrics to Amazon CloudWatch. Detailed monitoring, or one-minute metrics, is currently unavailable for Amazon SNS. A topic stays active for six hours from the last activity (i.e. any API call) on the topic.

Amazon SNS Metrics

This section discusses the metrics that Amazon Simple Notification Service (Amazon SNS) sends to Amazon CloudWatch.

MetricDescription

NumberOfMessagesPublished

The number of messages published to the topic.

Units: Count

Valid Statistics: Sum

PublishSize

The size of messages published to the topic.

Units: Bytes

Valid Statistics: Minimum, Maximum, Average and Count

NumberOfNotificationsDelivered

The number of messages successfully delivered to all subscriptions of the topic.

Units: Count

Valid Statistics: Sum

NumberOfNotificationsFailed

The number of all notification attempts to the topic that failed delivery.

Units: Count

Valid Statistics: Sum

Dimensions for Amazon SNS Metrics

The only dimension that Amazon SNS sends to Amazon CloudWatch is TopicName. This means that all available statistics are filtered by TopicName.

Amazon SQS Dimensions and Metrics

Amazon SQS sends data points to Amazon CloudWatch for several metrics. All active queues automatically send five-minute metrics to Amazon CloudWatch. Detailed monitoring, or one-minute metrics, is currently unavailable for Amazon SQS. A queue stays active for six hours from the last activity (i.e. any API call) on the queue.

Amazon SQS Metrics

This section discusses the metrics that Amazon Simple Queue Service (Amazon SQS) sends to Amazon CloudWatch.

MetricDescription

NumberOfMessagesSent

The number of messages added to a queue.

Units: Count

Valid Statistics: Sum

SentMessageSize

The size of messages added to a queue.

Units: Bytes

Valid Statistics: Minimum, Maximum, Average and Count

NumberOfMessagesReceived

The number of messages returned by calls to the ReceiveMessage API action.

Units: Count

Valid Statistics: Sum

NumberOfEmptyReceives

The number of ReceiveMessage API calls that did not return a message.

Units: Count

Valid Statistics: Sum

NumberOfMessagesDeleted

The number of messages deleted from the queue.

Units: Count

Valid Statistics: Sum

ApproximateNumberOfMessagesDelayed

The number of messages in the queue that are delayed and not available for reading immediately. This can happen when the queue is configured as a delay queue or when a message has been sent with a delay parameter.

Units: Count

Valid Statistics: Average

ApproximateNumberOfMessagesVisible

The number of messages available for retrieval from the queue.

Units: Count

Valid Statistics: Average

ApproximateNumberOfMessagesNotVisible

The number of messages that are in flight. Messages are considered in flight if they have been sent to a client but have not yet been deleted or have not yet reached the end of their visibility window.

Units: Count

Valid Statistics: Average

Dimensions for Amazon SQS Metrics

The only dimension that Amazon SQS sends to Amazon CloudWatch is QueueName. This means that all available statistics are filtered by QueueName.

AWS Storage Gateway Dimensions and Metrics

AWS Storage Gateway sends data points to Amazon CloudWatch for several metrics. All active queues automatically send five-minute metrics to Amazon CloudWatch. Detailed monitoring, or one-minute metrics, is currently unavailable for AWS Storage Gateway.

AWS Storage Gateway Metrics

The following metrics are available from the AWS Storage Gateway Service.

The following table describes the AWS Storage Gateway metrics that you can use to get information about your gateways. Specify the GatewayId or GatewayName dimension for each metric to view the data for a gateway.

MetricDescription
ReadBytes

The total number of bytes read from your on-premises applications in the reporting period for all volumes in the gateway.

Use this metric to measure throughput by selecting the Sum statistic and dividing by the Period. Use this metric to measure operations rate (IOPS) by selecting the Samples statistic and dividing each data point by the Period.

Units: Bytes

WriteBytes

The total number of bytes written to your on-premises applications in the reporting period for all volumes in the gateway.

Use this metric to measure throughput by selecting the Sum statistic and dividing by the Period. Use this metric to measure operations rate (IOPS) by selecting the Samples statistic and dividing each data point by the Period.

Units: Bytes

ReadTime

The total number of milliseconds spent to do reads from your on-premises applications in the reporting period for all volumes in the gateway.

Use this metric with the Average statistic to measure average latency.

Units: Milliseconds

WriteTime

The total number of milliseconds spent to do writes from your on-premises applications in the reporting period for all volumes in the gateway.

Use this metric with the Average statistic to measure average latency.

Units: Milliseconds

QueuedWrites

The number of bytes waiting to be written to AWS, sampled at the end of the reporting period for all volumes in the gateway. These bytes are kept in your gateway's working storage.

Units: Bytes

CloudBytesDownloaded

The total number of bytes that the gateway downloaded from AWS during the reporting period.

Use this metric to measure throughput by selecting the Sum statistic and dividing by the Period. Use this metric to measure operations rate (IOPS) by selecting the Samples statistic and dividing each data point by the Period.

Units: Bytes

CloudBytesUploaded

The total number of bytes that the gateway uploaded to AWS during the reporting period.

Use this metric to measure throughput by selecting the Sum statistic and dividing by the Period. Use this metric to measure operations rate (IOPS) by selecting the Samples statistic and dividing each data point by the Period.

Units: Bytes

CloudDownloadLatency

The total number of milliseconds spent reading data from AWS during the reporting period.

Use this metric with the Average statistic to measure average latency.

Units: Milliseconds

WorkingStoragePercentUsed

Percent utilization of the gateway's working storage. The sample is taken at the end of the reporting period.

Units: Percent

WorkingStorageUsed

The total number of bytes being used in the gateway's working storage. The sample is taken at the end of the reporting period.

Units: Bytes

WorkingStorageFree

The total amount of unused space in the gateway's working storage. The sample is taken at the end of the reporting period.

Units: Bytes

The following table describes the AWS Storage Gateway metrics that you can use to get information about your storage volumes. Specify the VolumeId dimension for each metric to view the data for a storage volume.

MetricDescription
ReadBytes

The total number of bytes read from your on-premises applications in the reporting period.

Use this metric to measure throughput by selecting the Sum statistic and dividing by the Period. Use this metric to measure operations rate (IOPS) by selecting the Samples statistic and dividing each data point by the Period.

Units: Bytes

WriteBytes

The total number of bytes written to your on-premises applications in the reporting period.

Use this metric to measure throughput by selecting the Sum statistic and dividing by the Period. Use this metric to measure operations rate (IOPS) by selecting the Samples statistic and dividing each data point by the Period.

Units: Bytes

ReadTime

The total number of milliseconds spent to do reads from your on-premises applications in the reporting period.

Use this metric with the Average statistic to measure average latency.

Units: Milliseconds

WriteTime

The total number of milliseconds spent to do writes from your on-premises applications in the reporting period.

Use this metric with the Average statistic to measure average latency.

Units: Milliseconds

QueuedWrites

The number of bytes waiting to be written to AWS, sampled at the end of the reporting period.

Units: Bytes

Dimensions for AWS Storage Gateway Metrics

The Amazon CloudWatch namespace for the service is AWS/StorageGateway. Data is available automatically in 5-minute periods at no charge.

Dimension

Description

GatewayId, GatewayName

These dimensions filter the data you request to gateway-specific metrics. You can identify a gateway to work by its GatewayId or its GatewayName. However, note that if the name of your gateway was changed for the time range that you are interested in viewing metrics, then you should use the GatewayId.

Throughput and latency data of a gateway is based on all the volumes for the gateway. For information about working with gateway metrics, see Measuring Performance Between Your Gateway and AWS.

VolumeId

This dimension filters the data you request to volume-specific metrics. Identify a storage volume to work with by its VolumeId. For information about working with volume metrics, see Measuring Performance Between Your Application and Gateway.

Auto Scaling Dimensions and Metrics

This section discusses the metrics that Auto Scaling instances and groups send to Amazon CloudWatch and describes how to enable detailed (one-minute) monitoring and basic (five-minute) monitoring.

Auto Scaling Instance Support

This section discusses the metrics that Auto Scaling instances send to Amazon CloudWatch. Instance metrics are the metrics that an individual Amazon EC2 instance sends to Amazon CloudWatch. Instance metrics are the same metrics available for any Amazon EC2 instance, whether or not it is in an Auto Scaling group.

Amazon CloudWatch offers basic or detailed monitoring. Basic monitoring sends aggregated data about each instance to Amazon CloudWatch every five minutes. Detailed monitoring offers more frequent aggregated data by sending data from each instance every minute.

Note

Selecting detailed monitoring is a prerequisite for the collection of Auto Scaling group metrics. For more information, see Auto Scaling Group Support.

The following sections describe how to enable either detailed monitoring or basic monitoring.

Activating Detailed Instance Monitoring for Auto Scaling

To enable detailed instance monitoring for a new Auto Scaling group, you don't need to take any extra steps. One of your first steps when creating an Auto Scaling group is to create a launch configuration. Each launch configuration contains a flag named InstanceMonitoring.Enabled. The default value of this flag is true, so you don't need to set this flag if you want detailed monitoring.

If you have an Auto Scaling group for which you have explicitly selected basic monitoring, the switch to detailed monitoring involves several steps, especially if you have Amazon CloudWatch alarms configured to scale the group automatically.

To switch to detailed instance monitoring for an existing Auto Scaling group

  1. Create a launch configuration that has the InstanceMonitoring.Enabled flag enabled. If you are using the command line tools, create a launch configuration with the --monitoring-enabled option.

  2. Call UpdateAutoScalingGroup to update your Auto Scaling group with the launch configuration you created in the previous step. Auto Scaling will enable detailed monitoring for new instances that it creates.

  3. Choose one of the following actions to deal with all existing Amazon EC2 instances in the Auto Scaling group:

    To...Do This...
    Preserve existing instances Call MonitorInstances from the Amazon EC2 API for each existing instance to enable detailed monitoring.
    Terminate existing instances Call TerminateInstanceInAutoScalingGroup from the Auto Scaling API for each existing instance. Auto Scaling will use the updated launch configuration to create replacement instances with detailed monitoring enabled.
  4. If you have Amazon CloudWatch alarms associated with your Auto Scaling group, call PutMetricAlarm from the Amazon CloudWatch API to update each alarm so that the alarm's period value is set to 60 seconds.

Activating Basic Instance Monitoring for Auto Scaling

To create a new Auto Scaling group with basic monitoring instead of detailed monitoring, associate your new Auto Scaling group with a launch configuration that has the InstanceMonitoring.Enabled flag set to false. If you are using the command line tools, create a launch configuration with the --monitoring-disabled option.

To switch to basic instance monitoring for an existing Auto Scaling group

  1. Create a launch configuration that has the InstanceMonitoring.Enabled flag disabled. If you are using the command line tools, create a launch configuration with the --monitoring-disabled option.

  2. If you previously enabled group metrics with a call to EnableMetricsCollection, call DisableMetricsCollection on your Auto Scaling group to disable collection of all group metrics. For more information, see Auto Scaling Group Support.

  3. Call UpdateAutoScalingGroup to update your Auto Scaling group with the launch configuration you created in the previous step. Auto Scaling will disable detailed monitoring for new instances that it creates.

  4. Choose one of the following actions to deal with all existing Amazon EC2 instances in the Auto Scaling group:

    To...Do This...
    Preserve existing instances Call UnmonitorInstances from the Amazon EC2 API for each existing instance to disable detailed monitoring.
    Terminate existing instances Call TerminateInstanceInAutoScalingGroup from the Auto Scaling API for each existing instance. Auto Scaling will use the updated launch configuration to create replacement instances with detailed monitoring disabled.
  5. If you have Amazon CloudWatch alarms associated with your Auto Scaling group, call PutMetricAlarm from the Amazon CloudWatch API to update each alarm so that the alarm's period value is set to 300 seconds.

    Important

    If you do not update your alarms to match the five-minute data aggregations, your alarms will continue to check for statistics every minute and might find no data available for as many as four out of every five periods.

For more information on instance metrics for Amazon EC2 instances, see Amazon Elastic Compute Cloud Dimensions and Metrics.

Auto Scaling Group Support

Group metrics are metrics that an Auto Scaling group sends to Amazon CloudWatch to describe the group rather than any of its instances. If you enable group metrics, Auto Scaling sends aggregated data to Amazon CloudWatch every minute. If you disable group metrics, Auto Scaling does not send any group metrics data to Amazon CloudWatch.

To enable group metrics

  1. Enable detailed instance monitoring for the Auto Scaling group by setting the InstanceMonitoring.Enabled flag in the Auto Scaling group's launch configuration. For more information, see Auto Scaling Instance Support.

  2. Call EnableMetricsCollection, which is part of the Auto Scaling Query API. Alternatively, you can use the equivalent as-enable-metrics-collection command that is part of the Auto Scaling command line tools.

Auto Scaling group metrics table

You may enable or disable each of the following metrics, separately.

MetricDescription
GroupMinSize

The minimum size of the Auto Scaling group.

GroupMaxSize

The maximum size of the Auto Scaling group.

GroupDesiredCapacity

The number of instances that the Auto Scaling group attempts to maintain.

GroupInServiceInstances

The number of instances that are running as part of the Auto Scaling group. This metric does not include instances that are pending or terminating.

GroupPendingInstances

The number of instances that are pending. A pending instance is not yet in service. This metric does not include instances that are in service or terminating.

GroupTerminatingInstances

The number of instances that are in the process of terminating. This metric does not include instances that are in service or pending.

GroupTotalInstances

The total number of instances in the Auto Scaling group. This metric identifies the number of instances that are in service, pending, and terminating.

Dimensions for Auto Scaling Group Metrics

The only dimension that Auto Scaling sends to Amazon CloudWatch is the name of the Auto Scaling group. This means that all available statistics are filtered by Auto Scaling group name.

Elastic Load Balancing Dimensions and Metrics

This section discusses the metrics and dimensions that Elastic Load Balancing sends to Amazon CloudWatch. Amazon CloudWatch provides detailed monitoring of Elastic Load Balancing by default. Unlike Amazon EC2, you do not need to specifically enable detailed monitoring.

Note

Elastic Load Balancing only emits Amazon CloudWatch metrics when requests are flowing through the load balancer. Elastic Load Balancing measures and sends metrics for that load balancer in 60-second intervals.

Elastic Load Balancing Metrics

The following Elastic Load Balancing metrics are available from Amazon CloudWatch.

The HTTP response code metrics reflect the count of Elastic Load Balancing response codes that are sent to clients within a given time period. If no response codes in the category 2XX-5XX range are sent to clients within the given time period, values for these metrics will not be recorded in CloudWatch.

MetricDescription
Latency

Time elapsed after the request leaves the load balancer until it receives the corresponding response.

Units: Seconds

Valid Statistics: Minimum, Maximum, Average, and Count

RequestCount

The number of requests handled by the load balancer.

Units: Count

Valid Statistics: Sum

HealthyHostCount

The number of healthy Amazon EC2 instances registered with the load balancer in a specified Availability Zone. Hosts that have not failed more health checks than the value of the unhealthy threshold are considered healthy. When evaluating this metric, the dimensions must be provided for LoadBalancerName and AvailabilityZone. The metric represents the count of healthy instances in the specified Availability Zone. Instances may become unhealthy due to connectivity issues, health checks returning non-200 responses (in the case of HTTP or HTTPS health checks), or timeouts when performing the health check. To get the total count of all healthy hosts, this metric must be retrieved for each registered Availability Zone and then all the metrics need to be added together.

Units: Count

Valid Statistics: Minimum, Maximum, and Average

UnHealthyHostCount

The number of unhealthy Amazon EC2 instances registered with the load balancer in a specified Availability Zone. Hosts that have failed more health checks than the value of the unhealthy threshold are considered unhealthy. When evaluating this metric, the dimensions must be provided for LoadBalancerName and AvailabilityZone. The metric represents the count of unhealthy instances in the specified Availability Zone. Instances may become unhealthy due to connectivity issues, health checks returning non-200 responses (in the case of HTTP or HTTPS health checks), or timeouts when performing the health check. To get the total count of all unhealthy hosts, this metric must be retrieved for each registered Availability Zone and then all the metrics need to be added together.

Units: Count

Valid Statistics: Minimum, Maximum, and Average

HTTPCode_ELB_4XX

Count of HTTP response codes generated by Elastic Load Balancing that are in the 4xx (client error) series.

Units: Count

Valid Statistics: Sum

HTTPCode_ELB_5XX

Count of HTTP response codes generated by Elastic Load Balancing that are in the 5xx (server error) series. Elastic Load Balancing may generate 5xx errors if no back-end instances are registered, no healthy back-end instances, or the request rate exceeds Elastic Load Balancing's current available capacity. This response count does not include any responses that were generated by back-end instances.

Units: Count

Valid Statistics: Sum

HTTPCode_Backend_2XX

Count of HTTP response codes generated by back-end instances that are in the 2xx (success) series.

Units: Count

Valid Statistics: Sum

HTTPCode_Backend_3XX

Count of HTTP response codes generated by back-end instances that are in the 3xx (user action required) series.

Units: Count

Valid Statistics: Sum

HTTPCode_Backend_4XX

Count of HTTP response codes generated by back-end instances that are in the 4xx (client error) series. This response count does not include any responses that were generated by Elastic Load Balancing.

Units: Count

Valid Statistics: Sum

HTTPCode_Backend_5XX

Count of HTTP response codes generated by back-end instances that are in the 5xx (server error) series. This response count does not include any responses that were generated by Elastic Load Balancing.

Units: Count

Valid Statistics: Sum

Dimensions for Elastic Load Balancing Metrics

You can use the currently available dimensions for Elastic Load Balancing to refine the metrics returned by a query. For example, you could use HealthyHostCount and dimensions LoadBalancerName and AvailabilityZone to get the Average number of healthy Instances behind the specified LoadBalancer within the specified Availability Zone for a given period of time.

Elastic Load Balancing data can be aggregated along any of the following dimensions shown in the table below.

Dimension

Description

LoadBalancerName

Limits the metric data to Amazon EC2 instances that are connected to the specified load balancer.

AvailabilityZone

Limits the metric data to load balancers in the specified Availability Zone.