Amazon ElastiCache for Redis
ElastiCache for Redis User Guide (API Version 2015-02-02)

Which Metrics Should I Monitor?

The following CloudWatch metrics offer good insight into ElastiCache performance. In most cases, we recommend that you set CloudWatch alarms for these metrics so that you can take corrective action before performance issues occur.

CPUUtilization

This is a host-level metric reported as a percent. For more information, see Host-Level Metrics.

Generally speaking, we suggest you set your threshold at 90% of your available CPU bandwidth. Because Redis is single-threaded, the actual threshold value should be calculated as a fraction of the node's total capacity. For example, suppose you are using a node type that has four cores. In this case, the threshold for CPUUtilization would be 90/4, or 22.5%. To find the number of cores (vCPUs) your node type has, see Amazon ElastiCache Pricing.

You will need to determine your own threshold, based on the number of cores in the cache node that you are using. If you exceed this threshold, and your main workload is from read requests, scale your cache cluster out by adding read replicas. If the main workload is from write requests, depending on your cluster configuration, we recommend that you:

  • Redis (cluster mode disabled) clusters: scale up by using a larger cache instance type.

  • Redis (cluster mode enabled) clusters: add more shards to distribute the write workload across more primary nodes.

Tip

Instead of using the Host-Level metric CPUUtilization, Redis users might be able to use the Redis metric EngineCPUUtilization which reports the percent of usage on the single core available to you. To see if this metric is available on your nodes and for more information, see Metrics for Redis.

SwapUsage

This is a host-level metric reported in bytes. For more information, see Host-Level Metrics.

This metric should not exceed 50 MB. If it does, see the following topics:

Evictions

This is a cache engine metric. We recommend that you determine your own alarm threshold for this metric based on your application needs.

CurrConnections

This is a cache engine metric. We recommend that you determine your own alarm threshold for this metric based on your application needs.

An increasing number of CurrConnections might indicate a problem with your application; you will need to investigate the application behavior to address this issue.