Metrics collected by the CloudWatch agent
You can collect metrics from servers by installing the CloudWatch agent on the server. You can install the agent on both Amazon EC2 instances and on-premises servers. You can also install the agent on computers running Linux, Windows Server, or macOS. If you install the agent on an Amazon EC2 instance, the metrics the agent collects are in addition to the metrics enabled by default on Amazon EC2 instances. For information about installing the CloudWatch agent on an instance, see Collect metrics, logs, and traces with the CloudWatch agent. You can use this section to learn about metrics the CloudWatch agent collects.
Metrics collected by the CloudWatch agent on Windows Server instances
On a server running Windows Server, installing the CloudWatch agent enables you to collect the
metrics associated with the counters in Windows Performance Monitor. The CloudWatch metric names
for these counters are created by putting a space between the object name and the counter
name. For example, the % Interrupt Time
counter of the Processor
object is given the metric name Processor % Interrupt Time
in CloudWatch. For more
information about Windows Performance Monitor counters, see the Microsoft Windows Server
documentation.
The default namespace for metrics collected by the CloudWatch agent is CWAgent
,
although you can specify a different namespace when you configure the agent.
Metrics collected by the CloudWatch agent on Linux and macOS instances
The following table lists the metrics that you can collect with the CloudWatch agent on Linux servers and macOS computers.
Metric | Description |
---|---|
|
The amount of time that the CPU is active in any capacity. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is running a virtual CPU for a guest operating system. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is running a virtual CPU for a guest operating system, which is low-priority and can be interrupted by other processes. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is idle. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is waiting for I/O operations to complete. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is servicing interrupts. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is in user mode with low-priority processes, which can easily be interrupted by higher-priority processes. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is servicing software interrupts. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is in stolen time, which is time spent in other operating systems in a virtualized environment. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is in system mode. This metric is measured in hundredths of a second. Unit: None |
|
The amount of time that the CPU is in user mode. This metric is measured in hundredths of a second. Unit: None |
|
The percentage of time that the CPU is active in any capacity. Unit: Percent |
|
The percentage of time that the CPU is running a virtual CPU for a guest operating system. Unit: Percent |
|
The percentage of time that the CPU is running a virtual CPU for a guest operating system, which is low-priority and can be interrupted by other processes. Unit: Percent |
|
The percentage of time that the CPU is idle. Unit: Percent |
|
The percentage of time that the CPU is waiting for I/O operations to complete. Unit: Percent |
|
The percentage of time that the CPU is servicing interrupts. Unit: Percent |
|
The percentage of time that the CPU is in user mode with low-priority processes, which higher-priority processes can easily interrupt. Unit: Percent |
|
The percentage of time that the CPU is servicing software interrupts. Unit: Percent |
|
The percentage of time that the CPU is in stolen time, or time spent in other operating systems in a virtualized environment. Unit: Percent |
|
The percentage of time that the CPU is in system mode. Unit: Percent |
|
The percentage of time that the CPU is in user mode. Unit: Percent |
|
Free space on the disks. Unit: Bytes |
|
The number of available index nodes on the disk. Unit: Count |
|
The total number of index nodes reserved on the disk. Unit: Count |
|
The number of used index nodes on the disk. Unit: Count |
|
Total space on the disks, including used and free. Unit: Bytes |
|
Used space on the disks. Unit: Bytes |
|
The percentage of total disk space that is used. Unit: Percent |
|
The number of I/O requests that have been issued to the device driver but have not yet completed. Unit: Count |
|
The amount of time that the disk has had I/O requests queued. Unit: Milliseconds The only statistic that should be used for this metric is |
|
The number of disk read operations. Unit: Count The only statistic that should be used for this metric is |
|
The number of bytes read from the disks. Unit: Bytes The only statistic that should be used for this metric is |
|
The amount of time that read requests have waited on the disks. Multiple read requests waiting at the same time increase the number. For example, if 5 requests all wait for an average of 100 milliseconds, 500 is reported. Unit: Milliseconds The only statistic that should be used for this metric is |
|
The number disk write operations. Unit: Count The only statistic that should be used for this metric is |
|
The number of bytes written to the disks. Unit: Bytes The only statistic that should be used for this metric is |
|
The amount of time that write requests have waited on the disks. Multiple write requests waiting at the same time increase the number. For example, if 8 requests all wait for an average of 1000 milliseconds, 8000 is reported. Unit: Milliseconds The only statistic that should be used for this metric is |
|
The number of packets queued and/or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance. This metric is collected only if you have listed it in the Unit: None |
|
The number of packets queued and/or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance. This metric is collected only if you have listed it in the Unit: None |
|
The number of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance. This metric is collected only if you have listed it in the Unit: None |
|
The number of packets dropped because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service. This metric is collected only if you have listed it in the Unit: None |
|
The number of packets queued and/or dropped because the bidirectional PPS exceeded the maximum for the instance. This metric is collected only if you have listed it in the Unit: None |
|
The amount of memory that has been used in some way during the last sample period. Unit: Bytes |
|
The amount of memory that is available and can be given instantly to processes. Unit: Bytes |
|
The percentage of memory that is available and can be given instantly to processes. Unit: Percent |
|
The amount of memory that is being used for buffers. Unit: Bytes |
|
The amount of memory that is being used for file caches. Unit: Bytes |
|
The amount of memory that isn't being used. Unit: Bytes |
|
The amount of memory that hasn't been used in some way during the last sample period Unit: Bytes |
|
The total amount of memory. Unit: Bytes |
|
The amount of memory currently in use. Unit: Bytes |
|
The percentage of memory currently in use. Unit: Percent |
|
The number of bytes received by the network interface. Unit: Bytes The only statistic that should be used for this metric is |
|
The number of bytes sent by the network interface. Unit: Bytes The only statistic that should be used for this metric is |
|
The number of packets received by this network interface that were dropped. Unit: Count The only statistic that should be used for this metric is |
|
The number of packets transmitted by this network interface that were dropped. Unit: Count The only statistic that should be used for this metric is |
|
The number of receive errors detected by this network interface. Unit: Count The only statistic that should be used for this metric is |
|
The number of transmit errors detected by this network interface. Unit: Count The only statistic that should be used for this metric is |
|
The number of packets sent by this network interface. Unit: Count The only statistic that should be used for this metric is |
|
The number of packets received by this network interface. Unit: Count The only statistic that should be used for this metric is |
|
The number of TCP connections with no state. Unit: Count |
|
The number of TCP connections waiting for a termination request from the client. Unit: Count |
|
The number of TCP connections that are waiting for a termination request with acknowledgement from the client. Unit: Count |
|
The number of TCP connections established. Unit: Count |
|
The number of TCP connections in the Unit: Count |
|
The number of TCP connections in the Unit: Count |
|
The number of TCP connections waiting for the client to send acknowledgement of the connection termination message. This is the last state right before the connection is closed down. Unit: Count |
|
The number of TCP ports currently listening for a connection request. Unit: Count |
|
The number of TCP connections with inactive clients. Unit: Count |
|
The number of TCP connections waiting for a matching connection request after having sent a connection request. Unit: Count |
|
The number of TCP connections waiting for connection request acknowledgement after having sent and received a connection request. Unit: Count |
|
The number of TCP connections currently waiting to ensure that the client received the acknowledgement of its connection termination request. Unit: Count |
|
The number of current UDP connections. Unit: Count |
|
The number of processes that are blocked. Unit: Count |
|
The number of processes that are dead, indicated by the This metric is not collected on macOS computers. Unit: Count |
|
The number of processes that are idle (sleeping for more than 20 seconds). Available only on FreeBSD instances. Unit: Count |
|
The number of processes that are paging, indicated by the This metric is not collected on macOS computers. Unit: Count |
|
The number of processes that are running, indicated by the Unit: Count |
|
The number of processes that are sleeping, indicated by the Unit: Count |
|
The number of processes that are stopped, indicated by the Unit: Count |
|
The total number of processes on the instance. Unit: Count |
|
The total number of threads making up the processes. This metric is available only on Linux instances. This metric is not collected on macOS computers. Unit: Count |
|
The number of processes that are paging, indicated by the Unit: Count |
|
The number of zombie processes, indicated by the Unit: Count |
|
The amount of swap space that isn't being used. Unit: Bytes |
|
The amount of swap space currently in use. Unit: Bytes |
|
The percentage of swap space currently in use. Unit: Percent |
Definitions of memory metrics collected by the CloudWatch agent
When the CloudWatch agent collects memory metrics, the source is the host's memory management subsystem.
For example, the Linux kernel exposes OS-maintained data in /proc
. For memory, the
data is in /proc/meminfo
.
Each different operating system and architecture has different calculations of the resources that are used by processes. For more information, see the following sections.
During each collection interval, the CloudWatch agent on each instance collects the instance resources and calculates the resources being used by all processes which are running in that instance. This information is reported back to CloudWatch metrics. You can configure the length of the collection interval in the CloudWatch agent configuration file. For more information, see CloudWatch agent configuration file: Agent section.
The following list explains how the memory metrics that the CloudWatch agent collects are defined.
Active Memory– Memory that is being used by a process. In other words, the memory used by current running apps.
Available Memory– The memory that can be instantly given to the processes without the system going into swap (also known as virtual memory).
Buffer Memory– The data area shared by hardware devices or program processes that operate at different speeds and priorities.
Cached Memory– Stores program instructions and data that are used repeatedly in the operation of programs that the CPU is likely to need next.
Free Memory– Memory that is not being used at all and is readily available. It is completely free for the system to be used when needed.
Inactive Memory– Pages that have not been accessed "recently".
Total Memory– The size of the actual physical memory RAM.
Used Memory– Memory that is currently in use by programs and processes.
Topics
Linux: Metrics collected and calculations used
Metrics collected and units:
Active (Bytes)
Available (Bytes)
Available Percent (Percent)
Buffered (Bytes)
Cached (Bytes)
Free (Bytes)
Inactive (Bytes)
Total (Bytes)
Used (Bytes)
Used Percent (Percent)
Used memory = Total Memory - Free Memory - Cached memory - Buffer memory
Total memory = Used Memory + Free Memory + Cached memory + Buffer memory
macOS: Metrics collected and calculations used
Metrics collected and units:
Active (Bytes)
Available (Bytes)
Available Percent (Percent)
Free (Bytes)
Inactive (Bytes)
Total (Bytes)
Used (Bytes)
Used Percent (Percent)
Available memory = Free Memory + Inactive memory
Used memory = Total Memory - Available memory
Total memory = Available Memory + Used Memory
Windows: Metrics collected
The metrics collected on Windows hosts are listed below. All of these metrics have None
for Unit
.
Available bytes
Cache Faults/sec
Page Faults/sec
Pages/sec
There are no calculations used for Windows metrics because the CloudWatch agent parses events from performance counters.
Example: Calculating memory metrics on Linux
As an example, suppose that entering the cat /proc/meminfo
command on a
Linux host shows the following
results:
MemTotal: 3824388 kB MemFree: 462704 kB MemAvailable: 2157328 kB Buffers: 126268 kB Cached: 1560520 kB SReclaimable: 289080 kB>
In this example, the CloudWatch agent will collect the following values. All the values that the CloudWatch agent collects and reports are in bytes.
mem_total
: 3916173312 bytesmem_available
: 2209103872 bytes (MemFree + Cached)mem_free
: 473808896 bytesmem_cached
: 1893990400 bytes (cached
+SReclaimable
mem_used
: 1419075584 bytes (MemTotal
– (MemFree
+Buffers
+ (Cached
+SReclaimable
)))mem_buffered
: 129667072 bytesmem_available_percent
: 56.41%mem_used_percent
: 36.24% (mem_used
/mem_total
) * 100