Menu
Amazon Redshift
Cluster Management Guide (API Version 2012-12-01)

Amazon Redshift Performance Data

Using Amazon CloudWatch metrics for Amazon Redshift, you can get information about your cluster's health and performance and see information at the node level. When working with these metrics, keep in mind that each metric has one or more dimensions associated with it. These dimensions tell you what the metric is applicable to, that is the scope of the metric. Amazon Redshift has the following two dimensions:

  • Metrics that have a NodeID dimension are metrics that provide performance data for nodes of a cluster. This set of metrics includes leader and compute nodes. Examples of these metrics include CPUUtilization, ReadIOPS, WriteIOPS.

  • Metrics that have only a ClusterIdentifier dimension are metrics that provide performance data for clusters. Examples of these metrics include HealthStatus and MaintenanceMode.

    Note

    In some metric cases, a cluster-specific metric represents an aggregation of node behavior. In these cases, take care in the interpretation of the metric value because the leader node's behavior is aggregated with the compute node.

For general information about CloudWatch metrics and dimensions, see Amazon CloudWatch Concepts in the Amazon CloudWatch User Guide.

For a further description of CloudWatch metrics for Amazon Redshift, see the following sections.

Amazon Redshift Metrics

The AWS/Redshift namespace includes the following metrics.

Metric Description
CPUUtilization

The percentage of CPU utilization. For clusters, this metric represents an aggregation of all nodes (leader and compute) CPU utilization values.

Units: Percent

Dimensions: NodeID, ClusterIdentifier

DatabaseConnections

The number of database connections to a cluster.

Units: Count

Dimensions: ClusterIdentifier

HealthStatus

Indicates the health of the cluster. Every minute the cluster connects to its database and performs a simple query. If it is able to perform this operation successfully, the cluster is considered healthy. Otherwise, the cluster is unhealthy. An unhealthy status can occur when the cluster database is under extremely heavy load or if there is a configuration problem with a database on the cluster.

Note

In Amazon CloudWatch this metric is reported as 1 or 0 whereas in the Amazon Redshift console, this metric is displayed with the words HEALTHY or UNHEALTHY for convenience. When this metric is displayed in the Amazon Redshift console, sampling averages are ignored and only HEALTHY or UNHEALTHY are displayed. In Amazon CloudWatch, values different than 1 and 0 may occur because of sampling issue. Any value below 1 for HealthStatus is reported as 0 (UNHEALTHY).

Units: 1/0 (HEALTHY/UNHEALTHY in the Amazon Redshift console)

Dimensions: ClusterIdentifier

MaintenanceMode

Indicates whether the cluster is in maintenance mode.

Note

In Amazon CloudWatch this metric is reported as 1 or 0 whereas in the Amazon Redshift console, this metric is displayed with the words ON or OFF for convenience. When this metric is displayed in the Amazon Redshift console, sampling averages are ignored and only ON or OFF are displayed. In Amazon CloudWatch, values different than 1 and 0 may occur because of sampling issues. Any value greater than 0 for MaintenanceMode is reported as 1 (ON).

Units: 1/0 (ON/OFF in the Amazon Redshift console).

Dimensions: ClusterIdentifier

NetworkReceiveThroughput

The rate at which the node or cluster receives data.

Units: Bytes/seconds (MB/s in the Amazon Redshift console)

Dimensions: NodeID, ClusterIdentifier

NetworkTransmitThroughput

The rate at which the node or cluster writes data.

Units: Bytes/second (MB/s in the Amazon Redshift console)

Dimensions: NodeID, ClusterIdentifier

PercentageDiskSpaceUsed

The percent of disk space used.

Units: Percent

Dimensions: NodeID, ClusterIdentifier

QueriesCompletedPerSecond

This metric is used to determine Query Throughput. The metric is the average number of queries completed per second, reported in five-minute intervals.

Units: Count/second

Dimensions: latency

QueryDuration

The average amount of time to complete a query. Reported in five-minute intervals.

Units: Microseconds

Dimensions: latency

QueryRuntimeBreakdown

The amount of time all active queries have spent in various stages of execution during the previous five minutes.

Units: Milliseconds

Dimensions: Stage

ReadIOPS

The average number of disk read operations per second.

Units: Count/second

Dimensions: NodeID

ReadLatency

The average amount of time taken for disk read I/O operations.

Units: Seconds

Dimensions: NodeID

ReadThroughput

The average number of bytes read from disk per second.

Units: Bytes (GB/s in the Amazon Redshift console)

Dimensions: NodeID

WLMQueriesCompletedPerSecond

This metric is used to determine Query Throughput for a Workload Management queue. The metric is the average number of queries completed per second for a Workload Management (WLM) queue, reported in five-minute intervals.

Units: Count/second

Dimensions: wlmid

WLMQueryDuration

The average length of time to complete a query for a Workload Management (WLM) queue. Reported in five-minute intervals.

Units: Microseconds

Dimensions: wlmid

WLMQueueLength

The number of queries in the queue for a Workload Management (WLM) queue.

Units: Count

Dimensions: service class

WriteIOPS

The average number of disk write operations per second.

Units: Count/seconds

Dimensions: NodeID

WriteLatency

The average amount of time taken for disk write I/O operations.

Units: Seconds

Dimensions: NodeID

WriteThroughput

The average number of bytes written to disk per second.

Units: Bytes (GB/s in the Amazon Redshift console)

Dimensions: NodeID

Amazon Redshift data can be filtered along any of the dimensions in the table following.

Dimension Description
latency

Values are short, medium, and long.

Short is less than 10 seconds, medium is between 10 seconds and 10 minutes, and long is over 10 minutes.

NodeID

Filters requested data that is specific to the nodes of a cluster. NodeID will be either "Leader", "Shared", or "Compute-N" where N is 0, 1, ... for the number of nodes in the cluster. "Shared" means that the cluster has only one node, i.e. the leader node and compute node are combined.

Metrics are reported for the leader node and compute nodes only for CPUUtilization, NetworkTransmitThroughput, and ReadIOPS. Other metrics that use the NodeId dimension are reported only for compute nodes.

ClusterIdentifier

Filters requested data that is specific to the cluster. Metrics that are specific to clusters include HealthStatus, MaintenanceMode, and DatabaseConnections. In general metrics in for this dimension (e.g. ReadIOPS) that are also metrics of nodes represent an aggregate of the node metric data. You should take care in interpreting these metrics because they aggregate behavior of leader and compute nodes.

service class

The identifier for a WLM service class.

Stage

The execution stages for a query. The possible values are:

  • QueryPlanning: Time spent parsing and optimizing SQL statements .

  • QueryWaiting: Time spent waiting in the wlm queue.

  • QueryExecutingRead: Time spent executing read queries.

  • QueryExecutingInsert: Time spent executing insert queries.

  • QueryExecutingDelete: Time spent executing delete queries.

  • QueryExecutingUpdate: Time spent executing update queries.

  • QueryExecutingCtas: Time spent executing create table as ... queries.

  • QueryExecutingUnload: Time spent executing unload queries.

  • QueryExecutingCopy: Time spent executing copy queries.

  • QueryCommit: Time spent committing.

wmlid

The identifier for a Workload Management Queue.

Amazon Redshift Query/Load Performance Data

In addition to the Amazon CloudWatch metrics, Amazon Redshift provides query and load performance data. Query and load performance data can be used to help you understand the relation between database performance and cluster metrics. For example, if you notice that a cluster's CPU spiked, you can find the spike on the cluster CPU graph and see the queries that were running at that time. Conversely, if you are reviewing a specific query, metric data (like CPU) is displayed in context so that you can understand the query's impact on cluster metrics.

Query and load performance data are not published as Amazon CloudWatch metrics and can only be viewed in the Amazon Redshift console. Query and load performance data are generated from querying with your database's system tables (see System Tables Reference in the Amazon Redshift Developer Guide). You can also generate your own custom database performance queries, but we recommend starting with the query and load performance data presented in the console. For more information about measuring and monitoring your database performance yourself, see Managing Performance in the Amazon Redshift Developer Guide.

The following table describes different aspects of query and load data you can access in the Amazon Redshift console.

Query/Load Data Description
Query summary

A list of queries in a specified time period. The list can be sorted on values such as query ID, query run time, and status. Access this data in the Queries tab of the cluster detail page.

Query Detail

Provides details on a particular query including:

  • Query properties such as the query ID, type, cluster the query was run on, and run time.

  • Details such as the status of the query and the number of errors.

  • The SQL statement that was run.

  • An explain plan if available.

  • Cluster performance data during the query execution (see Amazon Redshift Performance Data).

Load Summary

Lists all the loads in a specified time period. The list can be sorted on values such as query ID, query run time, and status. Access this data in the Loads tab of the cluster detail page. Access this data in the Queries tab of the cluster detail page.

Load Detail

Provides details on a particular load operation including:

  • Load properties such as the query ID, type, cluster the query was run on, and run time.

  • Details such as the status of the load and the number of errors.

  • The SQL statement that was run.

  • A list of loaded files.

  • Cluster performance data during the load operation (see Amazon Redshift Performance Data).