Recommended metrics - Amazon CloudWatch

Recommended metrics

The following table lists the recommended metrics for each component type.

Component type Workload type Recommended metric

EC2 instance (Windows servers)

Default/Custom

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

LogicalDisk % Free Space

Memory Available Mbytes

Active Directory

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

Memory Available Mbytes

Database ==> Instances Database Cache % Hit

DirectoryServices DRA Pending Replication Operations

DirectoryServices DRA Pending Replication Synchronizations

DNS Recursive Query Failure/sec

LogicalDisk Avg. Disk Queue Length

Java Application

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

Memory Available Mbytes

java_lang_threading_threadcount

java_lang_classloading_loadedclasscount

java_lang_memory_heapmemoryusage_used

java_lang_memory_heapmemoryusage_committed

java_lang_operatingsystem_freephysicalmemorysize

java_lang_operatingsystem_freeswapspacesize

Microsoft IIS/.NET Web Front-End

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

Memory Available Mbytes

.NET CLR Exceptions # of Exceps Thrown/Sec

.NET CLR Memory # Total Committed Bytes

.NET CLR Memory % Time in GC

ASP.NET Applications Requests in Application Queue

ASP.NET Requests Queued

ASP.NET Application Restarts

Microsoft SQL Server Database Tier

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

Memory Available Mbytes

Paging File % Usage

System Processor Queue Length

Network Interface Bytes Total/Sec

PhysicalDisk % Disk Time

SQLServer:Buffer Manager Buffer Cache Hit ratio

SQLServer:Buffer Manager Page Life Expectancy

SQLServer:General Statistics Processes Blocked

SQLServer:General Statistics User Connections

SQLServer:Locks Number of Deadlocks/Sec

SQLServer:SQL Statistics Batch Requests/Sec

MySQL

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

LogicalDisk % Free Space

Memory Available Mbytes

.NET workerpool/Mid-Tier

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

Memory Available Mbytes

.NET CLR Exceptions # of Exceps Thrown/Sec

.NET CLR Memory # Total Committed Bytes

.NET CLR Memory % Time in GC

.NET Core Tier

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

Memory Available Mbytes

Oracle

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

LogicalDisk % Free Space

Memory Available Mbytes

Postgres

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

LogicalDisk % Free Space

Memory Available Mbytes

SharePoint

CPUUtilization

StatusCheckFailed

Processor % Processor Time

Memory % Committed Bytes In Use

Memory Available Mbytes

ASP.NET Applications Cache API trims

ASP.NET Requests Rejected

ASP.NET Worker Process Restarts

Memory Pages/sec

SharePoint Publishing Cache Publishing cache flushes / second

SharePoint Foundation Executing Time/Page Request

SharePoint Disk-Based Cache Total number of cache compactions

SharePoint Disk-Based Cache Blob cache hit ratio

SharePoint Disk-Based Cache Blob Cache fill ratio

SharePoint Disk-Based Cache Blob cache flushes / second

ASP.NET Requests Queued

ASP.NET Applications Requests in Application Queue

ASP.NET Application Restarts

LogicalDisk Avg. Disk sec/Write

LogicalDisk Avg. Disk sec/Read

Processor % Interrupt Time

EC2 instance (Linux servers)

Default/Custom

CPUUtilization

StatusCheckFailed

disk_used_percent

mem_used_percent

Java Application

CPUUtilization

StatusCheckFailed

disk_used_percent

mem_used_percent

java_lang_threading_threadcount

java_lang_classloading_loadedclasscount

java_lang_memory_heapmemoryusage_used

java_lang_memory_heapmemoryusage_committed

java_lang_operatingsystem_freephysicalmemorysize

java_lang_operatingsystem_freeswapspacesize

.NET Core Tier or SQL Server Database Tier

CPUUtilization

StatusCheckFailed

disk_used_percent

mem_used_percent

Oracle

CPUUtilization

StatusCheckFailed

disk_used_percent

mem_used_percent

Postgres

CPUUtilization

StatusCheckFailed

disk_used_percent

mem_used_percent

EC2 instance group

SAP HANA multi-node or single node
  • hanadb_server_startup_time_variations_seconds

  • hanadb_level_5_alerts_count

  • hanadb_level_4_alerts_count

  • hanadb_out_of_memory_events_count

  • hanadb_max_trigger_read_ratio_percent

  • hanadb_max_trigger_write_ratio_percent

  • hanadb_log_switch_race_ratio_percent

  • hanadb_time_since_last_savepoint_seconds

  • hanadb_disk_usage_highlevel_percent

  • hanadb_current_allocation_limit_used_percent

  • hanadb_table_allocation_limit_used_percent

  • hanadb_cpu_usage_percent

  • hanadb_plan_cache_hit_ratio_percent

  • hanadb_last_data_backup_age_days

EBS volume Any

VolumeReadBytes

VolumeWriteBytes

VolumeReadOps

VolumeWriteOps

VolumeQueueLength

VolumeThroughputPercentage

VolumeConsumedReadWriteOps

BurstBalance

Classic ELB

Any

HTTPCode_Backend_4XX

HTTPCode_Backend_5XX

Latency

SurgeQueueLength

UnHealthyHostCount

Application ELB

Any

HTTPCode_Target_4XX_Count

HTTPCode_Target_5XX_Count

TargetResponseTime

UnHealthyHostCount

RDS Database instance

Any

CPUUtilization

ReadLatency

WriteLatency

BurstBalance

FailedSQLServerAgentJobsCount

RDS Database cluster Any

CPUUtilization

CommitLatency

DatabaseConnections

Deadlocks

FreeableMemory

NetworkThroughput

VolumeBytesUsed

Lambda Function

Any

Duration

Errors

IteratorAge

ProvisionedConcurrencySpilloverInvocations

Throttles

SQS Queue

Any

ApproximateAgeOfOldestMessage

ApproximateNumberOfMessagesVisible

NumberOfMessagesSent

Amazon DynamoDB table Any

SystemErrors

UserErrors

ConsumedReadCapacityUnits

ConsumedWriteCapacityUnits

ReadThrottleEvents

WriteThrottleEvents

ConditionalCheckFailedRequests

TransactionConflict

Amazon S3 bucket

Any

If replication configuration with Replication Time Control (RTC) is enabled:

ReplicationLatency

BytesPendingReplication

OperationsPendingReplication

If request metrics are turned on:

5xxErrors

4xxErrors

BytesDownloaded

BytesUploaded

AWS Step Functions

Any
General
  • ExecutionThrottled

  • ExecutionsAborted

  • ProvisionedBucketSize

  • ProvisionedRefillRate

  • ConsumedCapacity

If state machine type is EXPRESS or log group level is OFF
  • ExecutionsFailed

  • ExecutionsTimedOut

If state machine has Lambda functions
  • LambdaFunctionsFailed

  • LambdaFunctionsTimedOut

If state machine has activities
  • ActivitiesFailed

  • ActivitiesTimedOut

  • ActivitiesHeartbeatTimedOut

If state machine has service integrations
  • ServiceIntegrationsFailed

  • ServiceIntegrationsTimedOut

API Gateway REST API stage

Any
  • 4XXErrors

  • 5XXErrors

  • Latency

ECS Cluster

Any

CpuUtilized

MemoryUtilized

NetworkRxBytes

NetworkTxBytes

RunningTaskCount

PendingTaskCount

StorageReadBytes

StorageWriteBytes

CPUReservation (EC2 Launch Type only)

CPUUtilization (EC2 Launch Type only)

MemoryReservation (EC2 Launch Type only)

MemoryUtilization (EC2 Launch Type only)

GPUReservation (EC2 Launch Type only)

instance_cpu_utilization (EC2 Launch Type only)

instance_filesystem_utilization (EC2 Launch Type only)

instance_memory_utilization (EC2 Launch Type only)

instance_network_total_bytes (EC2 Launch Type only)

Java Application

CpuUtilized

MemoryUtilized

NetworkRxBytes

NetworkTxBytes

RunningTaskCount

PendingTaskCount

StorageReadBytes

StorageWriteBytes

CPUReservation (EC2 Launch Type only)

CPUUtilization (EC2 Launch Type only)

MemoryReservation (EC2 Launch Type only)

MemoryUtilization (EC2 Launch Type only)

GPUReservation (EC2 Launch Type only)

instance_cpu_utilization (EC2 Launch Type only)

instance_filesystem_utilization (EC2 Launch Type only)

instance_memory_utilization (EC2 Launch Type only)

instance_network_total_bytes (EC2 Launch Type only)

java_lang_threading_threadcount

java_lang_classloading_loadedclasscount

java_lang_memory_heapmemoryusage_used

java_lang_memory_heapmemoryusage_committed

java_lang_operatingsystem_freephysicalmemorysize

java_lang_operatingsystem_freeswapspacesize

ECS Service

Any

CPUUtilization

MemoryUtilization

CpuUtilized

MemoryUtilized

NetworkRxBytes

NetworkTxBytes

RunningTaskCount

PendingTaskCount

StorageReadBytes

StorageWriteBytes

Java Application

CPUUtilization

MemoryUtilization

CpuUtilized

MemoryUtilized

NetworkRxBytes

NetworkTxBytes

RunningTaskCount

PendingTaskCount

StorageReadBytes

StorageWriteBytes

java_lang_threading_threadcount

java_lang_classloading_loadedclasscount

java_lang_memory_heapmemoryusage_used

java_lang_memory_heapmemoryusage_committed

java_lang_operatingsystem_freephysicalmemorysize

java_lang_operatingsystem_freeswapspacesize

EKS Cluster

Any

cluster_failed_node_count

node_cpu_reserved_capacity

node_cpu_utilization

node_filesystem_utilization

node_memory_reserved_capacity

node_memory_utilization

node_network_total_bytes

pod_cpu_reserved_capacity

pod_cpu_utilization

pod_cpu_utilization_over_pod_limit

pod_memory_reserved_capacity

pod_memory_utilization

pod_memory_utilization_over_pod_limit

pod_network_rx_bytes

pod_network_tx_bytes

Java Application

cluster_failed_node_count

node_cpu_reserved_capacity

node_cpu_utilization

node_filesystem_utilization

node_memory_reserved_capacity

node_memory_utilization

node_network_total_bytes

pod_cpu_reserved_capacity

pod_cpu_utilization

pod_cpu_utilization_over_pod_limit

pod_memory_reserved_capacity

pod_memory_utilization

pod_memory_utilization_over_pod_limit

pod_network_rx_bytes

pod_network_tx_bytes

java_lang_threading_threadcount

java_lang_classloading_loadedclasscount

java_lang_memory_heapmemoryusage_used

java_lang_memory_heapmemoryusage_committed

java_lang_operatingsystem_freephysicalmemorysize

java_lang_operatingsystem_freeswapspacesize

Kubernetes Cluster on EC2

Any

cluster_failed_node_count

node_cpu_reserved_capacity

node_cpu_utilization

node_filesystem_utilization

node_memory_reserved_capacity

node_memory_utilization

node_network_total_bytes

pod_cpu_reserved_capacity

pod_cpu_utilization

pod_cpu_utilization_over_pod_limit

pod_memory_reserved_capacity

pod_memory_utilization

pod_memory_utilization_over_pod_limit

pod_network_rx_bytes

pod_network_tx_bytes

Java Application

cluster_failed_node_count

node_cpu_reserved_capacity

node_cpu_utilization

node_filesystem_utilization

node_memory_reserved_capacity

node_memory_utilization

node_network_total_bytes

pod_cpu_reserved_capacity

pod_cpu_utilization

pod_cpu_utilization_over_pod_limit

pod_memory_reserved_capacity

pod_memory_utilization

pod_memory_utilization_over_pod_limit

pod_network_rx_bytes

pod_network_tx_bytes

java_lang_threading_threadcount

java_lang_classloading_loadedclasscount

java_lang_memory_heapmemoryusage_used

java_lang_memory_heapmemoryusage_committed

java_lang_operatingsystem_freephysicalmemorysize

java_lang_operatingsystem_freeswapspacesize

The following table lists the recommended processes and process metrics for each component type. CloudWatch Application Insights does not recommend process monitoring for processes that do not run on an instance.

Component type Workload type Recommended process Recommended metric

EC2 instance (Windows servers)

Microsoft IIS/.NET Web Front-End

w3wp

procstat cpu_usage,

procstat memory_rss,

procstat memory_vms,

procstat read_bytes,

procstat write_bytes

Microsoft SQL Server Database Tier

SQLAgent

procstat cpu_usage,

procstat memory_rss,

procstat memory_vms,

procstat read_bytes,

procstat write_bytes

sqlservr

procstat cpu_usage,

procstat memory_rss,

procstat memory_vms,

procstat read_bytes,

procstat write_bytes

sqlwriter

procstat cpu_usage,

procstat memory_rss

ReportingServicesService

procstat cpu_usage,

procstat memory_rss

MsDtsServr

procstat cpu_usage,

procstat memory_rss,

procstat memory_vms,

procstat read_bytes,

procstat write_bytes

Msmdsrv

procstat cpu_usage,

procstat memory_rss,

procstat memory_vms,

procstat read_bytes,

procstat write_bytes

.NET workerpool/Mid-Tier

w3wp

procstat cpu_usage,

procstat memory_rss,

procstat memory_vms,

procstat read_bytes,

procstat write_bytes

.NET Core Tier

w3wp

procstat cpu_usage,

procstat memory_rss,

procstat memory_vms,

procstat read_bytes,

procstat write_bytes