Automate ingestion and visualization of Amazon MWAA custom metrics on Amazon Managed Grafana by using Terraform
Created by Faisal Abdullah (AWS) and Satya Vajrapu (AWS)
Summary
This pattern discusses how to use Amazon Managed Grafana to create and monitor custom metrics that are ingested by Amazon Managed Workflows for Apache Airflow (Amazon MWAA). Amazon MWAA serves as the orchestrator for workflows, employing Directed Acyclic Graphs (DAGs) that are scripted in Python. This pattern centers on the monitoring of custom metrics, including the total number of DAGs running within the last hour, the count of passed and failed DAGs each hour, and the average duration of these processes. This analysis shows how Amazon Managed Grafana integrates with Amazon MWAA to enable comprehensive monitoring and insights into the orchestration of workflows within this environment.
Prerequisites and limitations
Prerequisites
An active AWS account with the necessary user permissions to create and manage the following AWS services:
AWS Identity and Access Management (IAM) roles and policies
AWS Lambda
Amazon Managed Grafana
Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
Amazon Simple Storage Service (Amazon S3)
Amazon Timestream
Access to a shell environment which can be a terminal on your local machine or AWS CloudShell.
A shell environment with Git installed and the latest version of the AWS Command Line Interface (AWS CLI) installed and configured. For more information, see Installing or updating to the latest version of the AWS CLI in the AWS CLI documentation.
The following Terraform version installed:
required_version = ">= 1.6.1, < 2.0.0"
You can use tfswitchto switch between different versions of Terraform. Configured identity source in AWS IAM Identity Center for your AWS account. For more information, see Confirm your identity sources in IAM Identity Center in the IAM Identity Center documentation. You can choose from the default Identity Center directory, Active Directory, or an external Identity provider (IdP) such as Okta. For more information, see Related resources.
Limitations
Some AWS services aren’t available in all AWS Regions. For Region availability, see AWS services by Region
. For specific endpoints, see Service endpoints and quotas, and choose the link for the service.
Product versions
Terraform
required_version = ">= 1.6.1, < 2.0.0"
Amazon Managed Grafana version 9.4 or later. This pattern was tested on version 9.4.
Architecture
The following architecture diagram highlights the AWS services used in the solution.
The preceding diagram steps through the following workflow:
Custom metrics within Amazon MWAA originate from DAGs that are executing within the environment. The metrics upload to the Amazon S3 bucket in a CSV file format. The following DAGs use the database querying capabilities of Amazon MWAA:
run-example-dag
– This DAG contains sample Python code that defines one or more tasks. It runs every 7 minutes and prints the date. After printing the date, the DAG includes a task to sleep, or pause, execution for a specific duration.other-sample-dag
– This DAG runs every 10 mins and prints the date. After printing the date, the DAG includes a task to sleep, or pause, execution for a specific duration.data-extract
– This DAG runs every hour and queries the Amazon MWAA database and collects metrics. After the metrics are collected, this DAG writes them to an Amazon S3 bucket for further processing and analysis.
To streamline data processing, Lambda functions run when they’re triggered by Amazon S3 events, which facilitates the loading of metrics into Timestream.
Timestream is integrated as a data source within Amazon Managed Grafana where all the custom metrics from Amazon MWAA are stored.
Users can query the data and construct custom dashboards to visualize key performance indicators and gain insights into the orchestration of workflows within Amazon MWAA.
Tools
AWS services
AWS IAM Identity Center helps you centrally manage single sign-on (SSO) access to all of your AWS accounts and cloud applications.
AWS Lambda is a compute service that helps you run code without needing to provision or manage servers. It runs your code only when needed and scales automatically, so you pay only for the compute time that you use. In this pattern, AWS Lambda runs the Python code in response to Amazon S3 events and manages the compute resources automatically.
Amazon Managed Grafana is a fully managed data visualization service that you can use to query, correlate, and visualize, and alert on your metrics, logs, and traces. This pattern uses Amazon Managed Grafana to create a dashboard for metrics visualization and alerts.
Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and operate data pipelines in the cloud at scale. Apache Airflow
is an open source tool used to programmatically author, schedule, and monitor sequences of processes and tasks referred to as workflows. In this pattern, sample DAGs and a metrics extractor DAG are deployed in Amazon MWAA. Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data. In this pattern, Amazon S3 is used to store DAGs, scripts, and custom metrics in CSV format.
Amazon Timestream for LiveAnalytics is is a fast, scalable, fully managed, purpose-built time series database that makes it easy to store and analyze trillions of time series data points per day. Timestream for LiveAnalytics also integrates with commonly used services for data collection, visualization, and machine learning. In this pattern, it’s used to ingest the generated Amazon MWAA custom metrics.
Other tools
HashiCorp Terraform
is an open source infrastructure as code (IaC) tool that helps you use code to provision and manage cloud infrastructure and resources. This pattern uses a Terraform module to automate the provisioning of infrastructure in AWS.
Code repository
The code for this pattern is available on GitHub in the visualize-amazon-mwaa-custom-metrics-grafanastacks/Infra
folder contains the following:
Terraform configuration files for all AWS resources
Grafana dashboard .json file in the
grafana
folderAmazon Managed Workflows for Apache Airflow DAGs in the
mwaa/dags
folderLambda code to parse the .csv file and store metrics in the Timestream database in the
src
folderIAM policy .json files in the
templates
folder
Best practices
Terraform must store state about your managed infrastructure and configuration so that it can map real-world resources to your configuration. By default, Terraform stores state locally in a file named terraform.tfstate
. It's crucial to ensure the safety and integrity of your Terraform state file because it maintains the current state of your infrastructure. For more information, see Remote State
Epics
Task | Description | Skills required |
---|---|---|
Deploy the infrastructure. | To deploy the solution infrastructure, do the following:
| AWS DevOps |
Task | Description | Skills required |
---|---|---|
Validate the Amazon MWAA environment. | To validate the Amazon MWAA environment, do the following:
| AWS DevOps, Data engineer |
Verify the DAG schedules. | To view each DAG schedule, go to the Schedule tab in the Airflow UI. Each of the following DAGs has a pre-configured schedule, which runs in the Amazon MWAA environment and generates custom metrics:
You can also see the successful runs of each DAG under the Runs column. | Data engineer, AWS DevOps |
Task | Description | Skills required |
---|---|---|
Configure access to the Amazon Managed Grafana workspace. | The Terraform scripts created the required Amazon Managed Grafana workspace, dashboards, and metrics page. To configure access so that you can view them, do the following:
| AWS DevOps |
Install the Amazon Timestream plugin. | Amazon MWAA custom metrics are loaded into the Timestream database. You use the Timestream plugin to visualize the metrics with Amazon Managed Grafana dashboards. To install the Timestream plugin, do the following:
For more information, see Extend your workspace with plugins in the Amazon Managed Grafana documentation. | AWS DevOps, DevOps engineer |
Task | Description | Skills required |
---|---|---|
View the Amazon Managed Grafana dashboard. | To view the metrics that were ingested into the Amazon Managed Grafana workspace, do the following:
The dashboard metrics page shows the following information:
| AWS DevOps |
Customize the Amazon Managed Grafana dashboard. | To customize the dashboards for further future enhancements, do the following:
Alternatively, the source code for this dashboard is available in the | AWS DevOps |
Task | Description | Skills required |
---|---|---|
Pause the Amazon MWAA DAG runs. | To pause the DAG runs, do the following:
| AWS DevOps, Data engineer |
Delete the objects in the Amazon S3 buckets. | To delete the Amazon S3 buckets mwaa-events-bucket-* and mwaa-metrics-bucket-*, follow the instructions for using the Amazon S3 console in Deleting a bucket in the Amazon S3 documentation. | AWS DevOps |
Destroy the resources created by Terraform. | To destroy the resources created by Terraform and the associated local Terraform state file, do the following:
| AWS DevOps |
Troubleshooting
Issue | Solution |
---|---|
| Upgrade your AWS CLI to the latest version. |
Loading data sources error -
| The error is intermittent. Wait a few minutes, and then refresh your data sources to view the listed Timestream data source. |
Related resources
AWS documentation
AWS videos
Configure IAM Identity Center with Amazon Managed Grafana for authentication, as shown in the following video
.
If IAM Identity Center isn’t available, you can also integrate the Amazon Managed Grafana authentication by using an external Identity provider (IdP) such as Okta, as shown in the following video
.
Additional information
You can create a comprehensive monitoring and alerting solution for your Amazon MWAA environment, enabling proactive management and rapid response to potential issues or anomalies. Amazon Managed Grafana includes the following capabilities:
Alerting – You can configure alerts in Amazon Managed Grafana based on predefined thresholds or conditions. Set up email notifications to alert relevant stakeholders when certain metrics exceed or fall below specified thresholds. For more information, see Grafana alerting in the Amazon Managed Grafana documentation.
Integration – You can integrate Amazon Managed Grafana with various third-party tools such as OpsGenie, PagerDuty, or Slack for enhanced notification capabilities. For example, you can set up webhooks or integrate with APIs to trigger incidents and notifications in these platforms based on alerts generated in Amazon Managed Grafana. In addition, this pattern provides a GitHub repository