Container, queue, and database metrics for Amazon MWAA - Amazon Managed Workflows for Apache Airflow

Container, queue, and database metrics for Amazon MWAA

In addition to Apache Airflow metrics, you can monitor the underlying components of your Amazon Managed Workflows for Apache Airflow environments using CloudWatch, which collects raw data and processes data into readable, near real-time metrics. With these environment metrics, you will have greater visibility into key performance indicators to help you appropriately size your environments and debug issues with your workflows. These metrics apply to all supported Apache Airflow versions on Amazon MWAA.

Amazon MWAA will provide CPU and memory utilization for each Amazon Elastic Container Service (Amazon ECS) container and Amazon Aurora PostgreSQL instance, and Amazon Simple Queue Service (Amazon SQS) metrics for the number of messages and the age of the oldest message, Amazon Relational Database Service (Amazon RDS) metrics for database connections, disk queue depth, write operations, latency, and throughput, and Amazon RDS Proxy metrics. These metrics also include the number of base workers, additional workers, schedulers, and web servers.

These statistics are kept for 15 months, so that you can access historical information and gain a better perspective on why a schedule is failing, and troubleshoot underlying issues. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met. For more information, see the Amazon CloudWatch User Guide.

Terms

Namespace

A namespace is a container for the CloudWatch metrics of an AWS service. For Amazon MWAA, the namespace is AWS/MWAA.

CloudWatch metrics

A CloudWatch metric represents a time-ordered set of data points that are specific to CloudWatch.

Dimension

A dimension is a name/value pair that is part of the identity of a metric.

Unit

A statistic has a unit of measure. For Amazon MWAA, units include Count.

Dimensions

This section describes the CloudWatch dimensions grouping for Amazon MWAA metrics in CloudWatch.

Dimension Description

Cluster

Metrics for the minimum three Amazon ECS container that an Amazon MWAA environment uses to run Apache Airflow components: scheduler, worker, and web server.

Queue

Metrics for the Amazon SQS queues that decouple the scheduler from workers. When workers read the messages, they are considered in-flight and not available for other workers. Messages become available for other workers to read if they are not deleted before the 12 hours visibility timeout.

Database

Metrics the Aurora clusters used by Amazon MWAA. This includes metrics for the primary database instance and a read replica to support the read operations. Amazon MWAA publishes database metrics for both READER and WRITER instances.

Accessing metrics in the CloudWatch console

This section describes how to access your Amazon MWAA metrics in CloudWatch.

To view performance metrics for a dimension
  1. Open the Metrics page on the CloudWatch console.

  2. Use the AWS Region selector to select your region.

  3. Choose the AWS/MWAA namespace.

  4. In the All metrics tab, choose a dimension. For example, Cluster.

  5. Choose a CloudWatch metric for a dimension. For example, NumSchedulers or CPUUtilization. Then, choose Graph all search results.

  6. Choose the Graphed metrics tab to view performance metrics.

List of metrics

The following tables list the cluster, queue, and database service metrics for Amazon MWAA. To view descriptions for metrics directly emitted from Amazon ECS, Amazon SQS, or Amazon RDS, choose the respective documentation link.

Cluster metrics

The following metrics apply to each scheduler, base worker, additional worker, and web server. For more information and descriptions of each cluster metric, see Available metrics and dimensions in the Amazon ECS Developer Guide.

Namespace Metric Unit

AWS/MWAA

CPUUtilization

Percent

AWS/MWAA

MemoryUtilization

Percent

Evaluating the number of additional worker and web server containers

You can use the component metrics provided under the Cluster dimension, as described in the following procedure, to assess how many additional workers, or web servers, an environment is using at a given point in time. You can do this by graphing either the CPUUtilization or the MemoryUtilization metric and setting the statistic type to Sample Count. The resulting value is the total number of RUNNING tasks for the AdditionalWorker component. Understanding the number of additional worker instances utilized by your environment can help you gauge how your environment scales and allow you to optimize the number of additional workers.

Workers
To evaluate the number of additional workers using the AWS Management Console
  1. Choose the AWS/MWAA namespace.

  2. In the All metrics tab, choose the Cluster dimension.

  3. Under the Cluster dimension, for the AdditionalWorker, choose either the CPUUtilization or the MemoryUtilization metric.

  4. On the Graphed metrics tab, set Period to 1 Minute and Statistic to Sample Count.

Web servers
To evaluate the number of additional web servers using the AWS Management Console
  1. Choose the AWS/MWAA namespace.

  2. In the All metrics tab, choose the Cluster dimension.

  3. Under the Cluster dimension, for the AdditionalWebservers, choose either the CPUUtilization or the MemoryUtilization metric.

  4. On the Graphed metrics tab, set Period to 1 Minute and Statistic to Sample Count.

For more information, see Service RUNNING task count in the Amazon Elastic Container Service Developer Guide.

Database metrics

The following metrics apply to each database instance associated with the Amazon MWAA environment.

Namespace Metric Unit

AWS/MWAA

CPUUtilization

Percent

AWS/MWAA

DatabaseConnections

Count

AWS/MWAA

DiskQueueDepth

Count

AWS/MWAA

FreeableMemory

Bytes

AWS/MWAA

VolumeWriteIOPS

Count per five minutes

AWS/MWAA

WriteIOPS

Count per second

AWS/MWAA

WriteLatency

Seconds

AWS/MWAA

WriteThroughput

Bytes per second

Queue metrics

For more information on units and descriptions for the following queue metrics, see Available CloudWatch metrics for Amazon SQS in the Amazon Simple Queue Service Developer Guide.

Namespace Metric Unit

AWS/MWAA

ApproximateAgeOfOldestTask

Seconds

AWS/MWAA

RunningTasks

Count

AWS/MWAA

QueuedTasks

Count

Application Load Balancer metrics

Application Load Balancer metrics apply to the web servers running in your environment. Amazon MWAA uses these metrics to for scaling your web servers based on the amount of traffic. For more information on units and descriptions for the following load balancer metrics, see CloudWatch metrics for your Application Load Balancer in the Application Load Balancers User Guide.

Namespace Metric Unit

AWS/MWAA

ActiveConnectionCount

Count