Amazon MWAA Apache Airflow configuration options - Amazon Managed Workflows for Apache Airflow

Amazon MWAA Apache Airflow configuration options

Apache Airflow configuration options can be attached to your Amazon Managed Workflows for Apache Airflow (MWAA) environment as environment variables. You can choose from the suggested dropdown list, or specify any Apache Airflow v1.10.12 configuration options for your environment on the Amazon MWAA console. This page describes the Apache Airflow configuration options available in the dropdown list on the Amazon MWAA console, and how to use these options to override Apache Airflow configuration settings.

Prerequisites

To use the steps on this page, you'll need:

  1. The required AWS resources configured for your environment as defined in Get started with Amazon Managed Workflows for Apache Airflow (MWAA).

  2. An execution role with a permissions policy that grants Amazon MWAA access to the AWS resources used by your environment as defined in Amazon MWAA Execution role.

  3. An AWS account with access in AWS Identity and Access Management (IAM) to the Amazon S3 console, or the AWS Command Line Interface (AWS CLI) as defined in Accessing an Amazon MWAA environment.

How it works

When you create an environment, Amazon MWAA attaches the configuration settings you specify on the Amazon MWAA console in Airflow configuration options as environment variables to the AWS Fargate container for your environment. If you are using a setting of the same name in airflow.cfg, the options you specify on the Amazon MWAA console override the values in airflow.cfg.

While we don't expose the airflow.cfg in the Apache Airflow UI of an Amazon MWAA environment, you can change the Apache Airflow configuration options directly on the Amazon MWAA console and continue using all other settings in airflow.cfg.

Configuration options overview

When you add a configuration on the Amazon MWAA console, Amazon MWAA writes the configuration as an environment variable. For a complete reference, see Apache Airflow v1.10.12 configuration reference in the Apache Airflow reference guide.

  • Listed options. You can choose from one of the configuration settings available in the dropdown list. For example, dag_concurrency : 16. The configuration setting is translated to your environment's Fargate container as AIRFLOW__CORE__DAG_CONCURRENCY : 16

  • Custom options. You can also specify Airflow configuration options that are not available in the dropdown list. For example, foo.user : YOUR_USER_NAME. The configuration setting is translated to your environment's Fargate container as AIRFLOW__FOO__USER : YOUR_USER_NAME

Apache Airflow configuration options

The following image shows where you can customize the Apache Airflow configuration options on the Amazon MWAA console.


          This image shows where you can customize the Apache Airflow configuration options on the Amazon MWAA console.

Using the Amazon MWAA console

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Choose Edit.

  4. Choose Next.

  5. Choose Add custom configuration in the Airflow configuration options pane.

  6. Choose a configuration from the dropdown list and enter a value, or type a custom configuration and enter a value.

  7. Choose Add custom configuration for each configuration you want to add.

  8. Choose Save.

Configuration reference

The following section contains the list of available Apache Airflow configurations in the dropdown list on the Amazon MWAA console.

Email notifications

The following list shows the email notification configuration options available on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

email.email_backend

email.email_backend

The Apache Airflow utility used for email notifications in email_backend.

airflow.utils.email.send_email_smtp

smtp.smtp_host

smtp.smtp_host

The name of the outbound server used for the email address in smtp_host.

localhost

smtp.smtp_starttls

smtp.smtp_starttls

Transport Layer Security (TLS) is used to encrypt the email over the Internet in smtp_starttls.

False

smtp.smtp_ssl

smtp.smtp_ssl

Secure Sockets Layer (SSL) is used to connect the server and email client in smtp_ssl.

True

smtp.smtp_port

smtp.smtp_port

The Transmission Control Protocol (TCP) port designated to the server in smtp_port.

25

smtp.smtp_mail_from

smtp.smtp_mail_from

The outbound email address in smtp_mail_from.

myemail@domain.com

Task configurations

The following list shows the configurations available in the dropdown list for tasks on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

core.default_task_retries

core.default_task_retries

The number of times to retry an Apache Airflow task in default_task_retries.

3

core.parallelism

core.parallelism

The maximum number of task instances that can run simultaneously across the entire environment in parallel (parallelism).

40

Scheduler configurations

The following list shows the scheduler configurations available in the dropdown list on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

scheduler.catchup_by_default

scheduler.catchup_by_default

Tells the scheduler to create a DAG run to "catch up" to the specific time interval in catchup_by_default.

False

scheduler.scheduler_zombie_task_threshold

scheduler.scheduler_zombie_task_threshold

Tells the scheduler whether to mark the task instance as failed and reschedule the task in scheduler_zombie_task_threshold.

300

Worker configurations

The following list shows the configurations available in the dropdown list for workers on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

celery.worker_autoscale

celery.worker_autoscale

The maximum and minimum number of tasks that can run concurrently on any worker using the Celery Executor in worker_autoscale. Value must be comma-separated in the following order: max_concurrency,min_concurrency.

16,12

System settings

The following list shows the configurations available in the dropdown list for Apache Airflow system settings on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

core.default_ui_timezone

core.default_ui_timezone

The default Apache Airflow UI datetime setting in default_ui_timezone.

America/New_York

Examples and sample code

Example DAG

You can use the following DAG to print your email_backend Apache Airflow configuration options. To run in response to Amazon MWAA events, copy the code to your environment's DAGs folder on your Amazon S3 storage bucket.

def print_var(**kwargs): email_backend = kwargs['conf'].get(section='email', key='email_backend') print("email_backend") return email_backend with DAG(dag_id="email_backend_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag: email_backend_test = PythonOperator( task_id="email_backend_test", python_callable=print_var, provide_context=True )

Example email configuration

The following Apache Airflow configuration options can be used for a Gmail.com email account using an app password. For more information, see Sign in using app passwords in the Gmail Help reference guide.


          This image shows how to configure a gmail.com email account using Apache Airflow configuration options on the MWAA console.

What's next?