Overview of Enhanced Monitoring - Amazon Relational Database Service

Overview of Enhanced Monitoring

Amazon RDS provides metrics in real time for the operating system (OS) that your DB instance runs on. You can view all the system metrics and process information for your RDS DB instances on the console. You can manage which metrics you want to monitor for each instance and customize the dashboard according to your requirements.

RDS delivers the metrics from Enhanced Monitoring into your Amazon CloudWatch Logs account. You can create metrics filters in CloudWatch from CloudWatch Logs and display the graphs on the CloudWatch dashboard. You can consume the Enhanced Monitoring JSON output from CloudWatch Logs in a monitoring system of your choice. For more information, see Enhanced Monitoring in the Amazon RDS FAQs.

Enhanced Monitoring availability

Enhanced Monitoring is available for the following database engines:

  • MariaDB

  • Microsoft SQL Server

  • MySQL

  • Oracle

  • PostgreSQL

Enhanced Monitoring is available for all DB instance classes except for the db.m1.small instance class.

Enhanced Monitoring metric descriptions

The following tables list the OS metrics available using Amazon CloudWatch Logs.

Metrics for MariaDB, MySQL, Oracle, and PostgreSQL DB instances

Group Metric Console name Description

General

engine

Not applicable

The database engine for the DB instance.

instanceID

Not applicable

The DB instance identifier.

instanceResourceID

Not applicable

An immutable identifier for the DB instance that is unique to an AWS Region, also used as the log stream identifier.

numVCPUs

Not applicable

The number of virtual CPUs for the DB instance.

timestamp

Not applicable

The time at which the metrics were taken.

uptime

Not applicable

The amount of time that the DB instance has been active.

version

Not applicable

The version of the OS metrics' stream JSON format.

cpuUtilization

guest

CPU Guest

The percentage of CPU in use by guest programs.

idle

CPU Idle

The percentage of CPU that is idle.

irq

CPU IRQ

The percentage of CPU in use by software interrupts.

nice

CPU Nice

The percentage of CPU in use by programs running at lowest priority.

steal

CPU Steal

The percentage of CPU in use by other virtual machines.

system

CPU System

The percentage of CPU in use by the kernel.

total

CPU Total

The total percentage of the CPU in use. This value includes the nice value.

user

CPU User

The percentage of CPU in use by user programs.

wait

CPU Wait

The percentage of CPU unused while waiting for I/O access.

diskIO

avgQueueLen

Avg Queue Size

The number of requests waiting in the I/O device's queue.

avgReqSz

Ave Request Size

The average request size, in kilobytes.

await

Disk I/O Await

The number of milliseconds required to respond to requests, including queue time and service time.

device

Not applicable

The identifier of the disk device in use.

readIOsPS

Read IO/s

The number of read operations per second.

readKb

Read Total

The total number of kilobytes read.

readKbPS

Read Kb/s

The number of kilobytes read per second.

readLatency

Read Latency

The elapsed time between the submission of a read I/O request and its completion, in milliseconds.

This metric is only available for Amazon Aurora.

readThroughput

Read Throughput

The amount of network throughput used by requests to the DB cluster, in bytes per second.

This metric is only available for Amazon Aurora.

rrqmPS

Rrqms

The number of merged read requests queued per second.

tps

TPS

The number of I/O transactions per second.

util

Disk I/O Util

The percentage of CPU time during which requests were issued.

writeIOsPS

Write IO/s

The number of write operations per second.

writeKb

Write Total

The total number of kilobytes written.

writeKbPS

Write Kb/s

The number of kilobytes written per second.

writeLatency

Write Latency

The average elapsed time between the submission of a write I/O request and its completion, in milliseconds.

This metric is only available for Amazon Aurora.

writeThroughput

Write Throughput

The amount of network throughput used by responses from the DB cluster, in bytes per second.

This metric is only available for Amazon Aurora.

wrqmPS

Wrqms

The number of merged write requests queued per second.

physicalDeviceIO

avgQueueLen

Physical Devices Avg Queue Size

The number of requests waiting in the I/O device's queue.

avgReqSz

Physical Devices Ave Request Size

The average request size, in kilobytes.

await

Physical Devices Disk I/O Await

The number of milliseconds required to respond to requests, including queue time and service time.

device

Not applicable

The identifier of the disk device in use.

readIOsPS

Physical Devices Read IO/s

The number of read operations per second.

readKb

Physical Devices Read Total

The total number of kilobytes read.

readKbPS

Physical Devices Read Kb/s

The number of kilobytes read per second.

rrqmPS

Physical Devices Rrqms

The number of merged read requests queued per second.

tps

Physical Devices TPS

The number of I/O transactions per second.

util

Physical Devices Disk I/O Util

The percentage of CPU time during which requests were issued.

writeIOsPS

Physical Devices Write IO/s

The number of write operations per second.

writeKb

Physical Devices Write Total

The total number of kilobytes written.

writeKbPS

Physical Devices Write Kb/s

The number of kilobytes written per second.

wrqmPS

Physical Devices Wrqms

The number of merged write requests queued per second.

fileSys

maxFiles

Max Inodes

The maximum number of files that can be created for the file system.

mountPoint

Not applicable

The path to the file system.

name

Not applicable

The name of the file system.

total

Total Filesystem

The total number of disk space available for the file system, in kilobytes.

used

Used Filesystem

The amount of disk space used by files in the file system, in kilobytes.

usedFilePercent

Used %

The percentage of available files in use.

usedFiles

Used Inodes

The number of files in the file system.

usedPercent

Used Inodes %

The percentage of the file-system disk space in use.

loadAverageMinute

fifteen

Load Avg 15 min

The number of processes requesting CPU time over the last 15 minutes.

five

Load Avg 5 min

The number of processes requesting CPU time over the last 5 minutes.

one

Load Avg 1 min

The number of processes requesting CPU time over the last minute.

memory

active

Active Memory

The amount of assigned memory, in kilobytes.

buffers

Buffered Memory

The amount of memory used for buffering I/O requests prior to writing to the storage device, in kilobytes.

cached

Cached Memory

The amount of memory used for caching file system–based I/O.

dirty

Dirty Memory

The amount of memory pages in RAM that have been modified but not written to their related data block in storage, in kilobytes.

free

Free Memory

The amount of unassigned memory, in kilobytes.

hugePagesFree

Huge Pages Free

The number of free huge pages. Huge pages are a feature of the Linux kernel.

hugePagesRsvd

Huge Pages Rsvd

The number of committed huge pages.

hugePagesSize

Huge Pages Size

The size for each huge pages unit, in kilobytes.

hugePagesSurp

Huge Pages Surp

The number of available surplus huge pages over the total.

hugePagesTotal

Huge Pages Total

The total number of huge pages.

inactive

Inactive Memory

The amount of least-frequently used memory pages, in kilobytes.

mapped

Mapped Memory

The total amount of file-system contents that is memory mapped inside a process address space, in kilobytes.

pageTables

Page Tables

The amount of memory used by page tables, in kilobytes.

slab

Slab Memory

The amount of reusable kernel data structures, in kilobytes.

total

Total Memory

The total amount of memory, in kilobytes.

writeback

Writeback Memory

The amount of dirty pages in RAM that are still being written to the backing storage, in kilobytes.

network

interface

Not applicable

The identifier for the network interface being used for the DB instance.

rx

RX

The number of bytes received per second.

tx

TX

The number of bytes uploaded per second.

processList

cpuUsedPc

CPU %

The percentage of CPU used by the process.

id

Not applicable

The identifier of the process.

memoryUsedPc

MEM%

The percentage of memory used by the process.

name

Not applicable

The name of the process.

parentID

Not applicable

The process identifier for the parent process of the process.

rss

RES

The amount of RAM allocated to the process, in kilobytes.

tgid

Not applicable

The thread group identifier, which is a number representing the process ID to which a thread belongs. This identifier is used to group threads from the same process.

vss

VIRT

The amount of virtual memory allocated to the process, in kilobytes.

swap

swap

Swap

The amount of swap memory available, in kilobytes.

swap in

Swaps in

The amount of memory, in kilobytes, swapped in from disk.

swap out

Swaps out

The amount of memory, in kilobytes, swapped out to disk.

free

Free Swap

The amount of swap memory free, in kilobytes.

committed

Committed Swap

The amount of swap memory, in kilobytes, used as cache memory.

tasks

blocked

Tasks Blocked

The number of tasks that are blocked.

running

Tasks Running

The number of tasks that are running.

sleeping

Tasks Sleeping

The number of tasks that are sleeping.

stopped

Tasks Stopped

The number of tasks that are stopped.

total

Tasks Total

The total number of tasks.

zombie

Tasks Zombie

The number of child tasks that are inactive with an active parent task.

Metrics for Microsoft SQL Server DB instances

Group Metric Console name Description

General

engine

Not applicable

The database engine for the DB instance.

instanceID

Not applicable

The DB instance identifier.

instanceResourceID

Not applicable

An immutable identifier for the DB instance that is unique to an AWS Region, also used as the log stream identifier.

numVCPUs

Not applicable

The number of virtual CPUs for the DB instance.

timestamp

Not applicable

The time at which the metrics were taken.

uptime

Not applicable

The amount of time that the DB instance has been active.

version

Not applicable

The version of the OS metrics' stream JSON format.

cpuUtilization

idle

CPU Idle

The percentage of CPU that is idle.

kern

CPU Kernel

The percentage of CPU in use by the kernel.

user

CPU User

The percentage of CPU in use by user programs.

disks

name

Not applicable

The identifier for the disk.

totalKb

Total Disk Space

The total space of the disk, in kilobytes.

usedKb

Used Disk Space

The amount of space used on the disk, in kilobytes.

usedPc

Used Disk Space %

The percentage of space used on the disk.

availKb

Available Disk Space

The space available on the disk, in kilobytes.

availPc

Available Disk Space %

The percentage of space available on the disk.

rdCountPS

Reads/s

The number of read operations per second

rdBytesPS

Read Kb/s

The number of bytes read per second.

wrCountPS

Write IO/s

The number of write operations per second.

wrBytesPS

Write Kb/s

The amount of bytes written per second.

memory

commitTotKb

Commit Total

The amount of pagefile-backed virtual address space in use, that is, the current commit charge. This value is composed of main memory (RAM) and disk (pagefiles).

commitLimitKb

Maximum Commit

The maximum possible value for the commitTotKb metric. This value is the sum of the current pagefile size plus the physical memory available for pageable contents, excluding RAM that is assigned to nonpageable areas.

commitPeakKb

Commit Peak

The largest value of the commitTotKb metric since the operating system was last started.

kernTotKb

Total Kernel Memory

The sum of the memory in the paged and nonpaged kernel pools, in kilobytes.

kernPagedKb

Paged Kernel Memory

The amount of memory in the paged kernel pool, in kilobytes.

kernNonpagedKb

Nonpaged Kerenel Memory

The amount of memory in the nonpaged kernel pool, in kilobytes.

pageSize

Page Size

The size of a page, in bytes.

physTotKb

Total Memory

The amount of physical memory, in kilobytes.

physAvailKb

Available Memory

The amount of available physical memory, in kilobytes.

sqlServerTotKb

SQL Server Total Memory

The amount of memory committed to SQL Server, in kilobytes.

sysCacheKb

System Cache

The amount of system cache memory, in kilobytes.

network

interface

Not applicable

The identifier for the network interface being used for the DB instance.

rdBytesPS

Network Read Kb/s

The number of bytes received per second.

wrBytesPS

Network Write Kb/s

The number of bytes sent per second.

processList

cpuUsedPc

Used %

The percentage of CPU used by the process.

memUsedPc

MEM%

The percentage of total memory used by the process.

name

Not applicable

The name of the process.

pid

Not applicable

The identifier of the process. This value is not present for processes that are owned by Amazon RDS.

ppid

Not applicable

The process identifier for the parent of this process. This value is only present for child processes.

tid

Not applicable

The thread identifier. This value is only present for threads. The owning process can be identified by using the pid value.

workingSetKb

Not applicable

The amount of memory in the private working set plus the amount of memory that is in use by the process and can be shared with other processes, in kilobytes.

workingSetPrivKb

Not applicable

The amount of memory that is in use by a process, but can't be shared with other processes, in kilobytes.

workingSetShareableKb

Not applicable

The amount of memory that is in use by a process and can be shared with other processes, in kilobytes.

virtKb

Not applicable

The amount of virtual address space the process is using, in kilobytes. Use of virtual address space doesn't necessarily imply corresponding use of either disk or main memory pages.

system

handles

Handles

The number of handles that the system is using.

processes

Processes

The number of processes running on the system.

threads

Threads

The number of threads running on the system.

Differences between CloudWatch and Enhanced Monitoring metrics

A hypervisor creates and runs virtual machines (VMs). Using a hypervisor, an instance can support multiple guest VMs by virtually sharing memory and CPU. CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance. In contrast, Enhanced Monitoring gathers its metrics from an agent on the DB instance.

You might find differences between the CloudWatch and Enhanced Monitoring measurements, because the hypervisor layer performs a small amount of work. The differences can be greater if your DB instances use smaller instance classes. In this scenario, more virtual machines (VMs) are probably managed by the hypervisor layer on a single physical instance.

Retention of Enhanced Monitoring metrics

By default, Enhanced Monitoring metrics are stored for 30 days in the CloudWatch Logs. This retention period is different from typical CloudWatch metrics.

To modify the amount of time the metrics are stored in the CloudWatch Logs, change the retention for the RDSOSMetrics log group in the CloudWatch console. For more information, see Change log data retention in CloudWatch logs in the Amazon CloudWatch Logs User Guide.

Cost of Enhanced Monitoring

Enhanced Monitoring metrics are stored in the CloudWatch Logs instead of in CloudWatch metrics. The cost of Enhanced Monitoring depends on the following factors:

  • You are charged for Enhanced Monitoring only if you exceed the free tier provided by Amazon CloudWatch Logs. Charges are based on CloudWatch Logs data transfer and storage rates.

  • The amount of information transferred for an RDS instance is directly proportional to the defined granularity for the Enhanced Monitoring feature. A smaller monitoring interval results in more frequent reporting of OS metrics and increases your monitoring cost. To manage costs, set different granularities for different instances in your accounts.

  • Usage costs for Enhanced Monitoring are applied for each DB instance that Enhanced Monitoring is enabled for. Monitoring a large number of DB instances is more expensive than monitoring only a few.

  • DB instances that support a more compute-intensive workload have more OS process activity to report and higher costs for Enhanced Monitoring.

For more information about pricing, see Amazon CloudWatch pricing.