Menu
Elastic Load Balancing
Classic Load Balancers

CloudWatch Metrics for Your Classic Load Balancer

Elastic Load Balancing publishes data points to Amazon CloudWatch for your load balancers and your back-end instances. CloudWatch enables you to retrieve statistics about those data points as an ordered set of time-series data, known as metrics. Think of a metric as a variable to monitor, and the data points as the values of that variable over time. For example, you can monitor the total number of healthy EC2 instances for a load balancer over a specified time period. Each data point has an associated time stamp and an optional unit of measurement.

You can use metrics to verify that your system is performing as expected. For example, you can create a CloudWatch alarm to monitor a specified metric and initiate an action (such as sending a notification to an email address) if the metric goes outside what you consider an acceptable range.

For more information about Amazon CloudWatch, see the Amazon CloudWatch Developer Guide.

Elastic Load Balancing Metrics

Elastic Load Balancing reports metrics to CloudWatch only when requests are flowing through the load balancer. If there are requests flowing through the load balancer, Elastic Load Balancing measures and sends its metrics in 60-second intervals. If there are no requests flowing through the load balancer or no data for a metric, the metric is not reported.

Note that not every statistic available through CloudWatch applies to every metric for Elastic Load Balancing, though they are all available. For each metric, be aware of the statistics that provide the most useful information.

Elastic Load Balancing provides the following CloudWatch metrics.

MetricDescription
HealthyHostCount, UnHealthyHostCount

The number of healthy and unhealthy instances registered with your load balancer. A newly registered instance is considered healthy after it passes the first health check. An instance is considered unhealthy after it exceeds the unhealthy threshold configured for health checks. An unhealthy instance is considered healthy again after it meets the healthy threshold configured for health checks. If cross-zone load balancing is enabled, the number of healthy instances for the LoadBalancerName dimension is calculated across all Availability Zones.

Reporting criteria: There are registered instances

Statistics: The most useful statistics are average, min, and max. These statistics are determined by the load balancer nodes. Note that some load balancer nodes might determine that an instance is unhealthy for a brief period while other nodes determine that it is healthy.

Example: Suppose that your load balancer has 2 instances in us-west-2a and 2 instances in us-west-2b, us-west-2a has 1 unhealthy instance, and us-west-2b has no unhealthy instances. With the AvailabilityZone dimension, there is an average of 1 healthy and 1 unhealthy instance in us-west-2a, and an average of 2 healthy and 0 unhealthy instances in us-west-2b.

RequestCount

The number of requests completed or connections made during the specified interval (1 or 5 minutes).

[HTTP listener] The number of requests received and routed, including HTTP error responses from the registered instances.

[TCP listener] The number of connections made to the registered instances.

Reporting criteria: There is a nonzero value

Statistics: The most useful statistic is sum. Note that min, max, and average all return 1.

Example: Suppose that your load balancer has 2 instances in us-west-2a and 2 instances in us-west-2b, and that 100 requests are sent to the load balancer. There are 60 requests sent to us-west-2a, with each instance receiving 30 requests, and 40 requests sent to us-west-2b, with each instance receiving 20 requests. With the AvailabilityZone dimension, there is a sum of 60 requests in us-west-2a and 40 requests in us-west-2b. With the LoadBalancerName dimension, there is a sum of 100 requests.

Latency

[HTTP listener] The time elapsed, in seconds, after the request leaves the load balancer until the headers of the response are received.

Reporting criteria: There is a nonzero value

Statistics: The most useful statistic is average. Use max to determine whether some requests are taking substantially longer than the average. Note that min is typically not useful.

Example: Suppose that your load balancer has 2 instances in us-west-2a and 2 instances in us-west-2b, and that requests sent to 1 instance in us-west-2a have a higher latency. The average for us-west-2a has a higher value than the average for us-west-2b.

SurgeQueueLength

The total number of requests that are pending routing. The load balancer queues a request if it is unable to establish a connection with a healthy instance in order to route the request. The maximum size of the queue is 1,024. Additional requests are rejected when the queue is full. For more information, see SpilloverCount.

Reporting criteria: There is a nonzero value.

Statistics: The most useful statistic is max, because it represents the peak of queued requests. The average statistic can be useful in combination with min and max to determine the range of queued requests. Note that sum is not useful.

Example: Suppose that your load balancer has us-west-2a and us-west-2b enabled, and that instances in us-west-2a are experiencing high latency and are slow to respond to requests. As a result, the surge queue for the load balancer nodes in us-west-2a fills, with clients likely experiencing increased response times. If this continues, the load balancer will likely have spillovers (see the SpilloverCount metric). If us-west-2b continues to respond normally, the max for the load balancer will be the same as the max for us-west-2a.

SpilloverCount

The total number of requests that were rejected because the surge queue is full.

[HTTP listener] The load balancer returns an HTTP 503 error code.

[TCP listener] The load balancer closes the connection.

Reporting criteria: There is a nonzero value

Statistics: The most useful statistic is sum. Note that average, min, and max are reported per load balancer node and are not typically useful.

Example: Suppose that your load balancer has us-west-2a and us-west-2b enabled, and that instances in us-west-2a are experiencing high latency and are slow to respond to requests. As a result, the surge queue for the load balancer node in us-west-2a fills, resulting in spillover. If us-west-2b continues to respond normally, the sum for the load balancer will be the same as the sum for us-west-2a.

HTTPCode_ELB_4XX

[HTTP listener] The number of HTTP 4XX client error codes generated by the load balancer. Client errors are generated when a request is malformed or incomplete.

Reporting criteria: There is a nonzero value

Statistics: The most useful statistic is sum. Note that min, max, and average are all 1.

Example: Suppose that your load balancer has us-west-2a and us-west-2b enabled, and that client requests include a malformed request URL. As a result, client errors would likely increase in all Availability Zones. The sum for the load balancer is the sum of the values for the Availability Zones.

HTTPCode_ELB_5XX

[HTTP listener] The number of HTTP 5XX server error codes generated by the load balancer. This count does not include any response codes generated by the registered instances. The metric is reported if there are no healthy instances registered to the load balancer, or if the request rate exceeds the capacity of the instances (spillover) or the load balancer.

Reporting criteria: There is a nonzero value

Statistics: The most useful statistic is sum. Note that min, max, and average are all 1.

Example: Suppose that your load balancer has us-west-2a and us-west-2b enabled, and that instances in us-west-2a are experiencing high latency and are slow to respond to requests. As a result, the surge queue for the load balancer nodes in us-west-2a fills and clients receive a 503 error. If us-west-2b continues to respond normally, the sum for the load balancer equals the sum for us-west-2a.

HTTPCode_Backend_2XX, HTTPCode_Backend_3XX, HTTPCode_Backend_4XX, HTTPCode_Backend_5XX

[HTTP listener] The number of HTTP response codes generated by registered instances. This count does not include any response codes generated by the load balancer.

Reporting criteria: There is a nonzero value

Statistics: The most useful statistic is sum. Note that min, max, and average are all 1.

Example: Suppose that your load balancer has 2 instances in us-west-2a and 2 instances in us-west-2b, and that requests sent to 1 instance in us-west-2a result in HTTP 500 responses. The sum for us-west-2a includes these error responses, while the sum for us-west-2b does not include them. Therefore, the sum for the load balancer equals the sum for us-west-2a.

BackendConnectionErrors

The number of connections that were not successfully established between the load balancer and the registered instances. Because the load balancer retries the connection when there are errors, this count can exceed the request rate. Note that this count also includes any connection errors related to health checks.

Reporting criteria: There is a nonzero value

Statistics: The most useful statistic is sum. Note that average, min, and max are reported per load balancer node and are not typically useful. However, the difference between the minimum and maximum (or peak to average or average to trough) might be useful to determine whether a load balancer node is an outlier.

Example: Suppose that your load balancer has 2 instances in us-west-2a and 2 instances in us-west-2b, and that attempts to connect to 1 instance in us-west-2a result in back-end connection errors. The sum for us-west-2a includes these connection errors, while the sum for us-west-2b does not include them. Therefore, the sum for the load balancer equals the sum for us-west-2a.

Statistics for Elastic Load Balancing Metrics

CloudWatch provides statistics based on the metric data points published by Elastic Load Balancing. Statistics are metric data aggregations over specified period of time. When you request statistics, the returned data stream is identified by the metric name and dimension. A dimension is a name/value pair that uniquely identifies a metric. For example, you can request statistics for all the healthy EC2 instances behind a load balancer launched in a specific Availability Zone.

The min and max statistics reflect the minimum and maximum reported by the individual load balancer nodes. For example, suppose there are 2 load balancer nodes. One node has HealthyHostCount with a min of 2, a max of 10, and an average of 6, while the other node has HealthyHostCount with a min of 1, a max of 5, and an average of 3. Therefore, the load balancer has a min of 1, a max of 10, and an average of about 4.

The sum statistic is the aggregate value across all load balancer nodes. Because metrics include multiple reports per period, sum is only applicable to metrics that are aggregated across all load balancer nodes, such as RequestCount, HTTPCode_ELB_4XX, HTTPCode_ELB_5XX, HTTPCode_Backend_XXX, BackendConnectionErrors, and SpilloverCount.

The count statistic is the number of samples measured. Because metrics are gathered based on sampling intervals and events, count is typically not useful. For example, with HealthyHostCount, count is based on the number of samples that each load balancer node reports, not the number of healthy hosts.

Dimensions for Elastic Load Balancing Metrics

To filter the metrics for Elastic Load Balancing, you can use the following dimensions.

Dimension

Description

LoadBalancerName

Filter the metric data by the specified load balancer.

AvailabilityZone

Filter the metric data by the specified Availability Zone.

View CloudWatch Metrics for Your Load Balancer

You can view the CloudWatch metrics for your load balancers using the Amazon EC2 console. These metrics are displayed as monitoring graphs. The monitoring graphs show data points if the load balancer is active and receiving requests.

Alternatively, you can view metrics for your load balancer using the CloudWatch console.

To view metrics using the Amazon EC2 console

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

  2. On the navigation pane, under LOAD BALANCING, choose Load Balancers.

  3. Select your load balancer.

  4. Choose the Monitoring tab.

  5. (Optional) To filter the results by time, select a time range from Showing data for.

  6. To get a larger view of a single metric, select its graph. The following metrics are available:

    • Healthy Hosts — HealthyHostCount

    • Unhealthy Hosts — UnHealthyHostCount

    • Average Latency — Latency

    • Sum Requests — RequestCount

    • Backend Connection Errors — BackendConnectionErrors

    • Surge Queue Length — SurgeQueueLength

    • Spillover Count — SpilloverCount

    • Sum HTTP 2XXs — HTTPCode_Backend_2xx

    • Sum HTTP 4XXs — HTTPCode_Backend_4xx

    • Sum HTTP 5XXs — HTTPCode_Backend_5xx

    • Sum ELB HTTP 4XXs — HTTPCode_ELB_4XX

    • Sum ELB HTTP 5XXs — HTTPCode_ELB_5xx

To view metrics using the CloudWatch console

  1. Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

  2. In the navigation pane, under Metrics, choose ELB. By default, all metrics are displayed.

  3. To display only the metrics reported for your load balancers, choose Per-LB Metrics. To view the metrics for a single load balancer, type its name in the search box.

  4. To display only the metrics reported for your load balancers by Availability Zone, choose Per LB, per AZ Metrics. To view the metrics for a single load balancer, type its name in the search box. To view the metrics for a single Availability Zone, type its name in the search box.

Create CloudWatch Alarms for Your Load Balancer

An alarm watches a single metric over the time period that you specify. Depending on the value of the metric relative to a threshold that you define, the alarm can send one or more notifications using Amazon SNS, a service that enables applications, end users, and devices to instantly send and receive notifications. For more information, see Get Started with Amazon SNS.

An alarm sends notifications to Amazon SNS when the specified metric reaches the defined range and remains in that range for a specified period of time. An alarm has three possible states:

  • OK—The value of the metric is within the range you've specified.

  • ALARM—The value of the metric is outside the range that you've specified for the specified period of time.

  • INSUFFICIENT_DATA—Either the metric is not yet available or there is not enough data is available for the metric to determine the alarm state.

Whenever the state of an alarm changes, CloudWatch uses Amazon SNS to send a notification to the email addresses that you specified.

Use the following procedure to create an alarm for your load balancer using the Amazon EC2 console. The alarm sends notifications to an SNS topic whenever the load balancer's latency is above 120 seconds for 1 consecutive period of 5 minutes. Note that a short period creates a more sensitive alarm, while a longer period can mitigate brief spikes in a metric.

Note

Alternately, you can create an alarm for your load balancer using the CloudWatch console. For more information, see Send Email Based on Load Balancer Alarm in the Amazon CloudWatch Developer Guide.

To create an alarm for your load balancer

  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

  2. On the navigation pane, under LOAD BALANCING, choose Load Balancers.

  3. Select your load balancer.

  4. On the Monitoring tab, choose Create Alarm.

  5. If you have an SNS topic that you want to use, select it from Send a notification to. Otherwise, create an SNS topic as follows:

    1. Choose create topic.

    2. For Send a notification to, type a name for your topic.

    3. For With these recipients, type the email addresses of the recipients to notify, separated by commas. You can enter up to 10 email addresses. Each recipient receives an email from Amazon SNS with a link to subscribe to the SNS topic in order to receive notifications.

  6. Define the threshold for your alarm as follows. For Whenever, select Average and Average Latency. For Is, select > and enter 120. For For at least, type 1 and select a consecutive period of 5 minutes.

  7. For Name of alarm, a name is automatically generated for you. If you prefer, you can type a different name.

  8. Choose Create Alarm.