Container, queue, and database metrics for Amazon MWAA - Amazon Managed Workflows for Apache Airflow

Container, queue, and database metrics for Amazon MWAA

In addition to Apache Airflow metrics, you can monitor the underlying components of your Amazon Managed Workflows for Apache Airflow environments using CloudWatch, which collects raw data and processes data into readable, near real-time metrics. With these environment metrics, you will have greater visibility into key performance indicators to help you appropriately size your environments and debug issues with your workflows. These metrics apply to all supported Apache Airflow versions on Amazon MWAA.

Amazon MWAA will provide CPU and memory utilization for each Amazon Elastic Container Service (Amazon ECS) container and Amazon Aurora PostgreSQL instance, and Amazon Simple Queue Service (Amazon SQS) metrics for the number of messages and the age of the oldest message, Amazon Relational Database Service (Amazon RDS) metrics for database connections, disk queue depth, write operations, latency, and throughput, and Amazon RDS Proxy metrics. These metrics also include the number of base workers, additional workers, schedulers, and web servers.

These statistics are kept for 15 months, so that you can access historical information and gain a better perspective on why a schedule is failing, and troubleshoot underlying issues. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met. For more information, see the Amazon CloudWatch User Guide.

Terms

Namespace

A namespace is a container for the CloudWatch metrics of an AWS service. For Amazon MWAA, the namespace is AWS/MWAA.

CloudWatch metrics

A CloudWatch metric represents a time-ordered set of data points that are specific to CloudWatch.

Dimension

A dimension is a name/value pair that is part of the identity of a metric.

Unit

A statistic has a unit of measure. For Amazon MWAA, units include Count.

Dimensions

This section describes the CloudWatch dimensions grouping for Amazon MWAA metrics in CloudWatch.

Dimension Description

Cluster

Metrics for the minimum three Amazon ECS container that an Amazon MWAA environemnt uses to run Apache Airflow components: scheduler, worker, and web server.

Queue

Metrics for the Amazon SQS queues that decouple the scheduler from workers. When workers read the messages, they are considered in-flight and not available for other workers. Messages become available for other workers to read if they are not deleted before the 12 hours visibility timeout.

Database

Metrics the Aurora clusters used by Amazon MWAA. This includes metrics for the primary database instance and a read replica to support the read operations. Amazon MWAA publishes database metrics for both READER and WRITER instances.

Accessing metrics in the CloudWatch console

This section describes how to access your Amazon MWAA metrics in CloudWatch.

To view performance metrics for a dimension
  1. Open the Metrics page on the CloudWatch console.

  2. Use the AWS Region selector to select your region.

  3. Choose the AWS/MWAA namespace.

  4. In the All metrics tab, choose a dimension. For example, Cluster.

  5. Choose a CloudWatch metric for a dimension. For example, NumSchedulers or CPUUtilization. Then, choose Graph all search results.

  6. Choose the Graphed metrics tab to view performance metrics.

List of metrics

The following tables list the cluster, queue, and database service metrics for Amazon MWAA. To view descriptions for metrics directly emitted from Amazon ECS, Amazon SQS, or Amazon RDS, choose the respective documentation link.

Cluster metrics

The following metrics apply to each scheduler, base worker, additional worker, and web server. For more information and descriptions of each cluster metric, see Available metrics and dimensions in the Amazon ECS Developer Guide.

Namespace Metric Unit

AWS/MWAA

CPUUtilization

Percent

AWS/MWAA

MemoryUtilization

Percent

Evaluating the number additional worker instances

You can use the component metrics provided under the Cluster dimension, as described in the following procedure, to evaluate the additional workers that an environment is utilizing at a given point in time. You do this by graphing either the CPUUtilization or the MemoryUtilization metric and setting the statistic type to Sample Count. The resulting value is the total number of RUNNING tasks for the AdditionalWorker component. Understanding the number of additional worker instances utilized by your environment can help you gauge how your environment auto scales and allow you to optimize the number of additional workers.

  1. Choose the AWS/MWAA namespace.

  2. In the All metrics tab, choose the Cluster dimension.

  3. Under the Cluster dimension, for the AdditionalWorker, choose either the CPUUtilization or the MemoryUtilization metric.

  4. On the Graphed metrics tab, set Period to 1 Minute and Statistic to Sample Count.

For more information, see Service RUNNING task count in the Amazon Elastic Container Service Developer Guide.

Database metrics

The following metrics apply to each database instance until it is replaced by an Amazon RDS proxy. For more information and descriptions of the following database metrics, see CloudWatch metrics for Amazon RDS in the Amazon Relational Database Service User Guide.

Namespace Metric Unit

AWS/MWAA

CPUUtilization

Percent

AWS/MWAA

DatabaseConnections

Count

AWS/MWAA

DiskQueueDepth

Count

AWS/MWAA

FreeableMemory

Bytes

AWS/MWAA

VolumeWriteIOPS

Count per five minutes

AWS/MWAA

WriteIOPS

Count per second

AWS/MWAA

WriteLatency

Seconds

AWS/MWAA

WriteThroughput

Bytes per second

Database metrics for Amazon RDS Proxy (when available)

For more information descriptions of the following database proxy metrics, see Monitoring Amazon RDS Proxy metrics with CloudWatch in the Amazon Relational Database Service User Guide.

Namespace Metric Unit

AWS/MWAA

ClientConnections

Count

AWS/MWAA

ClientConnectionsClosed

Count

AWS/MWAA

ClientConnectionsReceived

Count

AWS/MWAA

AvailabilityPercentage

Percentage

AWS/MWAA

DatabaseConnectionsCurrentlyInTransaction

Count

AWS/MWAA

DatabaseConnectionsSetupFailed

Count

AWS/MWAA

DatabaseConnectionsSetupSucceeded

Count

AWS/MWAA

DatabaseConnectionRequests

Count

AWS/MWAA

DatabaseConnections

Count

AWS/MWAA

QueryDatabaseResponseLatency

Microseconds

AWS/MWAA

QueryRequests

Count

AWS/MWAA

QueryResponseLatency

Microseconds

Queue metrics

For more information on units and descriptions for the following queue metrics, see Available CloudWatch metrics for Amazon SQS in the Amazon Simple Queue Service Developer Guide.

Namespace Metric Unit

AWS/MWAA

ApproximateAgeOfOldestMessage

Seconds

AWS/MWAA

ApproximateNumberOfMessagesNotVisible (Running tasks)

Count

AWS/MWAA

ApproximateNumberOfMessagesVisible (Queued tasks)

Count