Creating Amazon CloudWatch Alarms
You can create a CloudWatch alarm that watches a single metric. The alarm performs one or more actions based on the value of the metric relative to a threshold over a number of time periods. The action can be an Amazon EC2 action, an Auto Scaling action, or a notification sent to an Amazon SNS topic.
Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state, the state must have changed and been maintained for a specified number of periods.
After an alarm invokes an action due to a change in state, its subsequent behavior depends on the type of action that you have associated with the alarm. For Amazon EC2 and Auto Scaling actions, the alarm continues to invoke the action for every period that the alarm remains in the new state. For Amazon SNS notifications, no additional actions are invoked.
CloudWatch doesn't test or validate the actions that you specify, nor does it detect any Auto Scaling or Amazon SNS errors resulting from an attempt to invoke nonexistent actions. Make sure that your actions exist.
You can also add alarms to dashboards. When an alarm is on a dashboard, it turns red when it
is in the
ALARM state, making it easier for you to monitor
its status proactively.
An alarm has three possible states:
OK—The metric is within the defined threshold
ALARM—The metric is outside of the defined threshold
INSUFFICIENT_DATA—The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state
In the following figure, the alarm threshold is set to 3 and the evaluation period is 3.
That is, the alarm invokes its action if three consecutive periods are breaching, which is a value of 3 or higher. In the
figure, this happens with the
third through fifth time periods, and the alarm's state is set to
ALARM. At period
six, the value dips below the threshold, and the state reverts to
OK. Later, during
the ninth time period, the threshold is breached again, but for only one period. Consequently, the alarm state remains
Configuring How CloudWatch Alarms Treats Missing Data
Similar to how each alarm is always in one of three states, each specific data point reported to CloudWatch falls under one of three categories:
good(within the threshold)
bad(violating the threshold)
You can specify how alarms handle missing data points. Choose whether to treat missing data points as:
Missing (the alarm looks back farther in time to find additional data points)
Good (treated as a data point that is within the threshold)
Bad (treated as a data point that is breaching the threshold)
Ignored (the current alarm state is maintained)
The best choice
depends on the type of metric. For a metric that continually reports data, such as
CPUUtilization of an instance, you might want to treat
missing data points as
bad, because they may indicate something is wrong. But for a metric that generates data points only when an error occurs, such
ThrottledRequests in Amazon DynamoDB, you would want to treat missing data as
Choosing the best option for your alarm prevents unnecessary and misleading alarm condition changes, and also more accurately indicates the health of your system.
If you treat missing data as
missing and some data points in the current window
are missing, CloudWatch looks back extra periods to find other existing data points to assess
whether the alarm should change state. When this happens, if the furthest back period that
is now being considered is not breaching, the alarm state
not go to
Percentile-Based CloudWatch Alarms and Low Data Samples
When you set a percentile as the statistic for an alarm, you can specify what to do when there is not enough data for a good statistical assessment. You can choose to have the alarm evaluate the statistic anyway and possibly change the alarm state. Or, you can have the alarm ignore the metric while the sample size is low, and wait to evaluate it until there is enough data to be statistically significant.
For percentiles between 0.5 and 1.00, this setting is used when there are fewer than 10/(1-percentile) data points during the evaluation period. For example, this setting would be used if there were fewer than 1000 samples for an alarm on a p99 percentile. For percentiles between 0 and 0.5, the setting is used when there are fewer than 10/percentile data points.
Common Features of CloudWatch Alarms
The following features apply to all CloudWatch alarms:
You can create up to 5000 alarms per region per AWS account. To create or update an alarm, you use the
PutMetricAlarmAPI action (
You can list any or all of the currently configured alarms, and list any alarms in a particular state using
mon-describe-alarms). You can further filter the list by time range.
You can disable and enable alarms by using
You can test an alarm by setting it to any state using
mon-set-alarm-state). This temporary state change lasts only until the next alarm comparison occurs.
You can create an alarm using
mon-put-metric-alarm) before you've created a custom metric. For the alarm to be valid, you must include all of the dimensions for the custom metric in addition to the metric namespace and metric name in the alarm definition.
Finally, you can view an alarm's history using
mon-describe-alarm-history). CloudWatch preserves alarm history for two weeks. Each state transition is marked with a unique time stamp. In rare cases, your history might show more than one notification for a state change. The time stamp enables you to confirm unique state changes.
Some AWS resources do not send metric data to CloudWatch under certain conditions.
For example, Amazon EBS may not send metric data for an available volume that is not attached to an Amazon EC2 instance, because there is no metric activity to be monitored for that volume. If you have an alarm set for such a metric, you may notice its state change to Insufficient Data. This may simply be an indication that your resource is inactive, and may not necessarily mean that there is a problem.