Enhanced Monitoring - Amazon Relational Database Service

Enhanced Monitoring

Amazon RDS provides metrics in real time for the operating system (OS) that your DB instance runs on. You can view the metrics for your DB instance using the console. Also, you can consume the Enhanced Monitoring JSON output from Amazon CloudWatch Logs in a monitoring system of your choice.

By default, Enhanced Monitoring metrics are stored for 30 days in the CloudWatch Logs, which are different from typical CloudWatch metrics. To modify the amount of time the metrics are stored in the CloudWatch Logs, change the retention for the RDSOSMetrics log group in the CloudWatch console. For more information, see Change Log Data Retention in CloudWatch Logs in the Amazon CloudWatch Logs User Guide.

Because Enhanced Monitoring metrics are stored in the CloudWatch logs instead of in Cloudwatch metrics, the cost of Enhanced Monitoring depends on several factors:

  • You are only charged for Enhanced Monitoring that exceeds the free tier provided by Amazon CloudWatch Logs.

    For more information about pricing, see Amazon CloudWatch Pricing.

  • A smaller monitoring interval results in more frequent reporting of OS metrics and increases your monitoring cost.

  • Usage costs for Enhanced Monitoring are applied for each DB instance that Enhanced Monitoring is enabled for. Monitoring a large number of DB instances is more expensive than monitoring only a few.

  • DB instances that support a more compute-intensive workload have more OS process activity to report and higher costs for Enhanced Monitoring.

Enhanced Monitoring Availability

Enhanced Monitoring is available for the following database engines:

  • MariaDB

  • Microsoft SQL Server

  • MySQL version 5.5 or later

  • Oracle

  • PostgreSQL

Enhanced Monitoring is available for all DB instance classes except for db.m1.small, all db.m6g instance classes, and all db.r6g instance classes.

Differences Between CloudWatch and Enhanced Monitoring Metrics

CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB instance, and Enhanced Monitoring gathers its metrics from an agent on the instance. As a result, you might find differences between the measurements, because the hypervisor layer performs a small amount of work. The differences can be greater if your DB instances use smaller instance classes, because then there are likely more virtual machines (VMs) that are managed by the hypervisor layer on a single physical instance. Enhanced Monitoring metrics are useful when you want to see how different processes or threads on a DB instance use the CPU.

Setting Up for and Enabling Enhanced Monitoring

To set up for and enable Enhanced Monitoring, take the steps listed following.

Before You Begin

Enhanced Monitoring requires permission to act on your behalf to send OS metric information to CloudWatch Logs. You grant Enhanced Monitoring the required permissions using an AWS Identity and Access Management (IAM) role.

The first time that you enable Enhanced Monitoring in the console, you can select the Default option for the Monitoring Role property to have RDS create the required IAM role. RDS then automatically creates a role named rds-monitoring-role for you, and uses it for the specified DB instance or read replica.

You can also create the required role before you enable Enhanced Monitoring, and then specify your new role's name when you enable Enhanced Monitoring. You must create this required role if you enable Enhanced Monitoring using the AWS CLI or the RDS API.

To create the appropriate IAM role to permit Amazon RDS to communicate with the Amazon CloudWatch Logs service on your behalf, take the following steps.

The user that enables Enhanced Monitoring must be granted the PassRole permission. For more information, see Example 2 in Granting a User Permissions to Pass a Role to an AWS Service in the IAM User Guide.

To create an IAM role for Amazon RDS Enhanced Monitoring

  1. Open the IAM Console at https://console.aws.amazon.com.

  2. In the navigation pane, choose Roles.

  3. Choose Create role.

  4. Choose the AWS service tab, and then choose RDS from the list of services.

  5. Choose RDS - Enhanced Monitoring, and then choose Next: Permissions.

  6. Ensure that the Attached permissions policy page shows AmazonRDSEnhancedMonitoringRole, and then choose Next: Tags.

  7. On the Add tags page, choose Next: Review.

  8. For Role Name, enter a name for your role, for example emaccess, and then choose Create role.

Enabling and Disabling Enhanced Monitoring

You can enable and disable Enhanced Monitoring using the AWS Management Console, AWS CLI, or RDS API.

You can enable Enhanced Monitoring when you create a DB instance or read replica, or when you modify a DB instance. If you modify a DB instance to enable Enhanced Monitoring, you don't need to reboot your DB instance for the change to take effect.

You can enable Enhanced Monitoring in the RDS console when you do one of the following actions:

  • Create a DB instance – You can enable Enhanced Monitoring in the Monitoring section under Additional configuration.

  • Create a read replica – You can enable Enhanced Monitoring in the Monitoring section.

  • Modify a DB instance – You can enable Enhanced Monitoring in the Monitoring section.

To enable Enhanced Monitoring by using the RDS console, scroll to the Monitoring section and do the following:

  1. Choose Enable enhanced monitoring for your DB instance or read replica.

  2. Set the Monitoring Role property to the IAM role that you created to permit Amazon RDS to communicate with Amazon CloudWatch Logs for you, or choose Default to have RDS create a role for you named rds-monitoring-role.

  3. Set the Granularity property to the interval, in seconds, between points when metrics are collected for your DB instance or read replica. The Granularity property can be set to one of the following values: 1, 5, 10, 15, 30, or 60.

To disable Enhanced Monitoring, choose Disable enhanced monitoring.


            Enable Enhanced Monitoring

Enabling Enhanced Monitoring doesn't require your DB instance to restart.

Note

The fastest that the RDS console refreshes is every 5 seconds. If you set the granularity to 1 second in the RDS console, you still see updated metrics only every 5 seconds. You can retrieve 1-second metric updates by using CloudWatch Logs.

To enable Enhanced Monitoring using the AWS CLI, in the following commands, set the --monitoring-interval option to a value other than 0 and set the --monitoring-role-arn option to the role you created in Before You Begin.

The --monitoring-interval option specifies the interval, in seconds, between points when Enhanced Monitoring metrics are collected. Valid values for the option are 0, 1, 5, 10, 15, 30, and 60.

To disable Enhanced Monitoring using the AWS CLI, set the --monitoring-interval option to 0 in the these commands.

Example

The following example enables Enhanced Monitoring for a DB instance:

For Linux, macOS, or Unix:

aws rds modify-db-instance \ --db-instance-identifier mydbinstance \ --monitoring-interval 30 \ --monitoring-role-arn arn:aws:iam::123456789012:role/emaccess

For Windows:

aws rds modify-db-instance ^ --db-instance-identifier mydbinstance ^ --monitoring-interval 30 ^ --monitoring-role-arn arn:aws:iam::123456789012:role/emaccess

To enable Enhanced Monitoring using the RDS API, in the following operations, set the MonitoringInterval parameter to a value other than 0 and set the MonitoringRoleArn parameter to the role you created in Before You Begin.

The MonitoringInterval parameter specifies the interval, in seconds, between points when Enhanced Monitoring metrics are collected. Valid values for the parameter are 0, 1, 5, 10, 15, 30, and 60.

To disable Enhanced Monitoring using the RDS API, set the MonitoringInterval parameter to 0 in the these operations.

Viewing Enhanced Monitoring

You can view OS metrics reported by Enhanced Monitoring in the RDS console by choosing Enhanced monitoring for Monitoring.

The Enhanced Monitoring page is shown following.


        Dashboard view

Some DB instances use more than one disk for the DB instance's data storage volume. On those DB instances, the Physical Devices graphs show metrics for each one of the disks. For example, the following graph shows metrics for four disks.


        Graph with multiple disks
Note

Currently, Physical Devices graphs are not available for Microsoft SQL Server DB instances.

When you are viewing aggregated Disk I/O and File system graphs, the rdsdev device relates to the /rdsdbdata file system, where all database files and logs are stored. The filesystem device relates to the / file system (also known as root), where files related to the operating system are stored.


        Graph showing file system usage

If the DB instance is a Multi-AZ deployment, you can view the OS metrics for the primary DB instance and its Multi-AZ standby replica. In the Enhanced monitoring view, choose primary to view the OS metrics for the primary DB instance, or choose secondary to view the OS metrics for the standby replica.


        Primary and secondary choice for Enhanced Monitoring

For more information about Multi-AZ deployments, see High Availability (Multi-AZ) for Amazon RDS.

Note

Currently, viewing OS metrics for a Multi-AZ standby replica is not supported for MariaDB or Microsoft SQL Server DB instances.

If you want to see details for the processes running on your DB instance, choose OS process list for Monitoring.

The Process List view is shown following.


        Process list view

The Enhanced Monitoring metrics shown in the Process list view are organized as follows:

  • RDS child processes – Shows a summary of the RDS processes that support the DB instance, for example mysqld for MySQL DB instances. Process threads appear nested beneath the parent process. Process threads show CPU utilization only as other metrics are the same for all threads for the process. The console displays a maximum of 100 processes and threads. The results are a combination of the top CPU consuming and memory consuming processes and threads. If there are more than 50 processes and more than 50 threads, the console displays the top 50 consumers in each category. This display helps you identify which processes are having the greatest impact on performance.

  • RDS processes – Shows a summary of the resources used by the RDS management agent, diagnostics monitoring processes, and other AWS processes that are required to support RDS DB instances.

  • OS processes – Shows a summary of the kernel and system processes, which generally have minimal impact on performance.

The items listed for each process are:

  • VIRT – Displays the virtual size of the process.

  • RES – Displays the actual physical memory being used by the process.

  • CPU% – Displays the percentage of the total CPU bandwidth being used by the process.

  • MEM% – Displays the percentage of the total memory being used by the process.

The monitoring data that is shown in the RDS console is retrieved from Amazon CloudWatch Logs. You can also retrieve the metrics for a DB instance as a log stream from CloudWatch Logs. For more information, see Viewing Enhanced Monitoring by Using CloudWatch Logs.

Enhanced Monitoring metrics are not returned during the following:

  • A failover of the DB instance.

  • Changing the instance class of the DB instance (scale compute).

Enhanced Monitoring metrics are returned during a reboot of a DB instance because only the database engine is rebooted. Metrics for the operating system are still reported.

Viewing Enhanced Monitoring by Using CloudWatch Logs

After you have enabled Enhanced Monitoring for your DB instance, you can view the metrics for your DB instance using CloudWatch Logs, with each log stream representing a single DB instance being monitored. The log stream identifier is the resource identifier (DbiResourceId) for the DB instance.

To view Enhanced Monitoring log data

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

  2. If necessary, choose the region that your DB instance is in. For more information, see Regions and Endpoints in the Amazon Web Services General Reference.

  3. Choose Logs in the navigation pane.

  4. Choose RDSOSMetrics from the list of log groups.

    In a Multi-AZ deployment, log files with -secondary appended to the name are for the Multi-AZ standby replica.

    
            Multi-AZ standby replica log file
  5. Choose the log stream that you want to view from the list of log streams.

Available OS Metrics

The following tables list the OS metrics available using Amazon CloudWatch Logs.

Metrics for MariaDB, MySQL, Oracle, and PostgreSQL DB instances

Group Metric Console Name Description

General

engine

Not applicable

The database engine for the DB instance.

instanceID

Not applicable

The DB instance identifier.

instanceResourceID

Not applicable

An immutable identifier for the DB instance that is unique to an AWS Region, also used as the log stream identifier.

numVCPUs

Not applicable

The number of virtual CPUs for the DB instance.

timestamp

Not applicable

The time at which the metrics were taken.

uptime

Not applicable

The amount of time that the DB instance has been active.

version

Not applicable

The version of the OS metrics' stream JSON format.

cpuUtilization

guest

CPU Guest

The percentage of CPU in use by guest programs.

idle

CPU Idle

The percentage of CPU that is idle.

irq

CPU IRQ

The percentage of CPU in use by software interrupts.

nice

CPU Nice

The percentage of CPU in use by programs running at lowest priority.

steal

CPU Steal

The percentage of CPU in use by other virtual machines.

system

CPU System

The percentage of CPU in use by the kernel.

total

CPU Total

The total percentage of the CPU in use. This value includes the nice value.

user

CPU User

The percentage of CPU in use by user programs.

wait

CPU Wait

The percentage of CPU unused while waiting for I/O access.

diskIO

avgQueueLen

Avg Queue Size

The number of requests waiting in the I/O device's queue.

avgReqSz

Ave Request Size

The average request size, in kilobytes.

await

Disk I/O Await

The number of milliseconds required to respond to requests, including queue time and service time.

device

Not applicable

The identifier of the disk device in use.

diskQueueDepth

Disk Queue Depth

The number of pending input and output (I/O) requests waiting to access the disk.

readIOsPS

Read IO/s

The number of read operations per second.

readKb

Read Total

The total number of kilobytes read.

readKbPS

Read Kb/s

The number of kilobytes read per second.

readLatency

Read Latency

The elapsed time between the submission of a read I/O request and its completion, in milliseconds.

This metric is only available for Amazon Aurora.

readThroughput

Read Throughput

The amount of network throughput used by requests to the DB cluster, in bytes per second.

This metric is only available for Amazon Aurora.

rrqmPS

Rrqms

The number of merged read requests queued per second.

tps

TPS

The number of I/O transactions per second.

util

Disk I/O Util

The percentage of CPU time during which requests were issued.

writeIOsPS

Write IO/s

The number of write operations per second.

writeKb

Write Total

The total number of kilobytes written.

writeKbPS

Write Kb/s

The number of kilobytes written per second.

writeLatency

Write Latency

The average elapsed time between the submission of a write I/O request and its completion, in milliseconds.

This metric is only available for Amazon Aurora.

writeThroughput

Write Throughput

The amount of network throughput used by responses from the DB cluster, in bytes per second.

This metric is only available for Amazon Aurora.

wrqmPS

Wrqms

The number of merged write requests queued per second.

physicalDeviceIO

avgQueueLen

Physical Devices Avg Queue Size

The number of requests waiting in the I/O device's queue.

avgReqSz

Physical Devices Ave Request Size

The average request size, in kilobytes.

await

Physical Devices Disk I/O Await

The number of milliseconds required to respond to requests, including queue time and service time.

device

Not applicable

The identifier of the disk device in use.

readIOsPS

Physical Devices Read IO/s

The number of read operations per second.

readKb

Physical Devices Read Total

The total number of kilobytes read.

readKbPS

Physical Devices Read Kb/s

The number of kilobytes read per second.

rrqmPS

Physical Devices Rrqms

The number of merged read requests queued per second.

tps

Physical Devices TPS

The number of I/O transactions per second.

util

Physical Devices Disk I/O Util

The percentage of CPU time during which requests were issued.

writeIOsPS

Physical Devices Write IO/s

The number of write operations per second.

writeKb

Physical Devices Write Total

The total number of kilobytes written.

writeKbPS

Physical Devices Write Kb/s

The number of kilobytes written per second.

wrqmPS

Physical Devices Wrqms

The number of merged write requests queued per second.

fileSys

maxFiles

Max Inodes

The maximum number of files that can be created for the file system.

mountPoint

Not applicable

The path to the file system.

name

Not applicable

The name of the file system.

total

Total Filesystem

The total number of disk space available for the file system, in kilobytes.

used

Used Filesystem

The amount of disk space used by files in the file system, in kilobytes.

usedFilePercent

Used %

The percentage of available files in use.

usedFiles

Used Inodes

The number of files in the file system.

usedPercent

Used Inodes %

The percentage of the file-system disk space in use.

loadAverageMinute

fifteen

Load Avg 15 min

The number of processes requesting CPU time over the last 15 minutes.

five

Load Avg 5 min

The number of processes requesting CPU time over the last 5 minutes.

one

Load Avg 1 min

The number of processes requesting CPU time over the last minute.

memory

active

Active Memory

The amount of assigned memory, in kilobytes.

buffers

Buffered Memory

The amount of memory used for buffering I/O requests prior to writing to the storage device, in kilobytes.

cached

Cached Memory

The amount of memory used for caching file system–based I/O.

dirty

Dirty Memory

The amount of memory pages in RAM that have been modified but not written to their related data block in storage, in kilobytes.

free

Free Memory

The amount of unassigned memory, in kilobytes.

hugePagesFree

Huge Pages Free

The number of free huge pages. Huge pages are a feature of the Linux kernel.

hugePagesRsvd

Huge Pages Rsvd

The number of committed huge pages.

hugePagesSize

Huge Pages Size

The size for each huge pages unit, in kilobytes.

hugePagesSurp

Huge Pages Surp

The number of available surplus huge pages over the total.

hugePagesTotal

Huge Pages Total

The total number of huge pages.

inactive

Inactive Memory

The amount of least-frequently used memory pages, in kilobytes.

mapped

Mapped Memory

The total amount of file-system contents that is memory mapped inside a process address space, in kilobytes.

pageTables

Page Tables

The amount of memory used by page tables, in kilobytes.

slab

Slab Memory

The amount of reusable kernel data structures, in kilobytes.

total

Total Memory

The total amount of memory, in kilobytes.

writeback

Writeback Memory

The amount of dirty pages in RAM that are still being written to the backing storage, in kilobytes.

network

interface

Not applicable

The identifier for the network interface being used for the DB instance.

rx

RX

The number of bytes received per second.

tx

TX

The number of bytes uploaded per second.

processList

cpuUsedPc

CPU %

The percentage of CPU used by the process.

id

Not applicable

The identifier of the process.

memoryUsedPc

MEM%

The amount of memory used by the process, in kilobytes.

name

Not applicable

The name of the process.

parentID

Not applicable

The process identifier for the parent process of the process.

rss

RES

The amount of RAM allocated to the process, in kilobytes.

tgid

Not applicable

The thread group identifier, which is a number representing the process ID to which a thread belongs. This identifier is used to group threads from the same process.

VIRT

VIRT

The amount of virtual memory allocated to the process, in kilobytes.

swap

swap

Swap

The amount of swap memory available, in kilobytes.

swap in

Swaps in

The amount of memory, in kilobytes, swapped in from disk.

swap out

Swaps out

The amount of memory, in kilobytes, swapped out to disk.

free

Free Swap

The amount of swap memory free, in kilobytes.

committed

Committed Swap

The amount of swap memory, in kilobytes, used as cache memory.

tasks

blocked

Tasks Blocked

The number of tasks that are blocked.

running

Tasks Running

The number of tasks that are running.

sleeping

Tasks Sleeping

The number of tasks that are sleeping.

stopped

Tasks Stopped

The number of tasks that are stopped.

total

Tasks Total

The total number of tasks.

zombie

Tasks Zombie

The number of child tasks that are inactive with an active parent task.

Metrics for Microsoft SQL Server DB instances

Group Metric Console Name Description

General

engine

Not applicable

The database engine for the DB instance.

instanceID

Not applicable

The DB instance identifier.

instanceResourceID

Not applicable

An immutable identifier for the DB instance that is unique to an AWS Region, also used as the log stream identifier.

numVCPUs

Not applicable

The number of virtual CPUs for the DB instance.

timestamp

Not applicable

The time at which the metrics were taken.

uptime

Not applicable

The amount of time that the DB instance has been active.

version

Not applicable

The version of the OS metrics' stream JSON format.

cpuUtilization

idle

CPU Idle

The percentage of CPU that is idle.

kern

CPU Kernel

The percentage of CPU in use by the kernel.

user

CPU User

The percentage of CPU in use by user programs.

disks

name

Not applicable

The identifier for the disk.

totalKb

Total Disk Space

The total space of the disk, in kilobytes.

usedKb

Used Disk Space

The amount of space used on the disk, in kilobytes.

usedPc

Used Disk Space %

The percentage of space used on the disk.

availKb

Available Disk Space

The space available on the disk, in kilobytes.

availPc

Available Disk Space %

The percentage of space available on the disk.

rdCountPS

Reads/s

The number of read operations per second

rdBytesPS

Read Kb/s

The number of bytes read per second.

wrCountPS

Write IO/s

The number of write operations per second.

wrBytesPS

Write Kb/s

The amount of bytes written per second.

memory

commitTotKb

Commit Total

The amount of pagefile-backed virtual address space in use, that is, the current commit charge. This value is composed of main memory (RAM) and disk (pagefiles).

commitLimitKb

Maximum Commit

The maximum possible value for the commitTotKb metric. This value is the sum of the current pagefile size plus the physical memory available for pageable contents, excluding RAM that is assigned to nonpageable areas.

commitPeakKb

Commit Peak

The largest value of the commitTotKb metric since the operating system was last started.

kernTotKb

Total Kernel Memory

The sum of the memory in the paged and nonpaged kernel pools, in kilobytes.

kernPagedKb

Paged Kernel Memory

The amount of memory in the paged kernel pool, in kilobytes.

kernNonpagedKb

Nonpaged Kerenel Memory

The amount of memory in the nonpaged kernel pool, in kilobytes.

pageSize

Page Size

The size of a page, in bytes.

physTotKb

Total Memory

The amount of physical memory, in kilobytes.

physAvailKb

Available Memory

The amount of available physical memory, in kilobytes.

sqlServerTotKb

SQL Server Total Memory

The amount of memory committed to SQL Server, in kilobytes.

sysCacheKb

System Cache

The amount of system cache memory, in kilobytes.

network

interface

Not applicable

The identifier for the network interface being used for the DB instance.

rdBytesPS

Network Read Kb/s

The number of bytes received per second.

wrBytesPS

Network Write Kb/s

The number of bytes sent per second.

processList

cpuUsedPc

Used %

The percentage of CPU used by the process.

memUsedPc

MEM%

The percentage of total memory used by the process.

name

Not applicable

The name of the process.

pid

Not applicable

The identifier of the process. This value is not present for processes that are owned by Amazon RDS.

ppid

Not applicable

The process identifier for the parent of this process. This value is only present for child processes.

tid

Not applicable

The thread identifier. This value is only present for threads. The owning process can be identified by using the pid value.

workingSetKb

Not applicable

The amount of memory in the private working set plus the amount of memory that is in use by the process and can be shared with other processes, in kilobytes.

workingSetPrivKb

Not applicable

The amount of memory that is in use by a process, but can't be shared with other processes, in kilobytes.

workingSetShareableKb

Not applicable

The amount of memory that is in use by a process and can be shared with other processes, in kilobytes.

virtKb

Not applicable

The amount of virtual address space the process is using, in kilobytes. Use of virtual address space doesn't necessarily imply corresponding use of either disk or main memory pages.

system

handles

Handles

The number of handles that the system is using.

processes

Processes

The number of processes running on the system.

threads

Threads

The number of threads running on the system.