Apache Airflow configuration options - Amazon Managed Workflows for Apache Airflow

Apache Airflow configuration options

Apache Airflow configuration options can be attached to your Amazon Managed Workflows for Apache Airflow (MWAA) environment as environment variables. You can choose from the suggested dropdown list, or specify custom configuration options for your Apache Airflow version on the Amazon MWAA console. This page describes the Apache Airflow configuration options available, and how to use these options to override Apache Airflow configuration settings on your environment.

Prerequisites

You'll need the following before you can complete the steps on this page.

  1. Access. Your AWS account must have been granted access by your administrator to the AmazonMWAAFullConsoleAccess access control policy for your environment.

  2. Amazon S3 configurations. The Amazon S3 bucket used to store your DAGs, custom plugins in plugins.zip, and Python dependencies in requirements.txt must be configured with Public Access Blocked and Versioning Enabled.

  3. Permissions. Your Amazon MWAA environment must be permitted by your execution role to access the AWS resources used by your environment.

How it works

When you create an environment, Amazon MWAA attaches the configuration settings you specify on the Amazon MWAA console in Airflow configuration options as environment variables to the AWS Fargate container for your environment. If you're using a setting of the same name in airflow.cfg, the options you specify on the Amazon MWAA console override the values in airflow.cfg.

While we don't expose the airflow.cfg in the Apache Airflow UI of an Amazon MWAA environment, you can change the Apache Airflow configuration options directly on the Amazon MWAA console and continue using all other settings in airflow.cfg.

Using configuration options to load plugins in 2.0

By default in Apache Airflow 2.0, plugins are configured to be "lazily" loaded using the core.lazy_load_plugins : True setting. If you're using custom plugins in Apache Airflow v2.0.2, you must add core.lazy_load_plugins : False as an Airflow configuration option to load plugins at the start of each Airflow process to override the default setting.

Configuration options overview

When you add a configuration on the Amazon MWAA console, Amazon MWAA writes the configuration as an environment variable.

  • Listed options. You can choose from one of the configuration settings available for your Apache Airflow version in the dropdown list. For example, dag_concurrency : 16. The configuration setting is translated to your environment's Fargate container as AIRFLOW__CORE__DAG_CONCURRENCY : 16

  • Custom options. You can also specify Airflow configuration options that are not listed for your Apache Airflow version in the dropdown list. For example, foo.user : YOUR_USER_NAME. The configuration setting is translated to your environment's Fargate container as AIRFLOW__FOO__USER : YOUR_USER_NAME

Apache Airflow configuration options

The following image shows where you can customize the Apache Airflow configuration options on the Amazon MWAA console.


          This image shows where you can customize the Apache Airflow configuration options on the Amazon MWAA console.

Apache Airflow reference

The following section contains links to the list of available Apache Airflow configuration options in the Apache Airflow reference guide.

Using the Amazon MWAA console

The following procedure walks you through the steps of adding an Airflow configuration option to your environment.

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Choose Edit.

  4. Choose Next.

  5. Choose Add custom configuration in the Airflow configuration options pane.

  6. Choose a configuration from the dropdown list and enter a value, or type a custom configuration and enter a value.

  7. Choose Add custom configuration for each configuration you want to add.

  8. Choose Save.

Configuration reference

The following section contains the list of available Apache Airflow configurations in the dropdown list on the Amazon MWAA console.

Email configurations

The following list shows the Airflow email notification configuration options available on Amazon MWAA.

We recommend using port 587 for SMTP traffic. By default, AWS blocks outbound SMTP traffic on port 25 of all Amazon EC2 instances. If you want to send outbound traffic on port 25, you can request for this restriction to be removed.

Airflow v2.0.2
Airflow version Airflow configuration option Description Example value

v2.0.2

email.email_backend

The Apache Airflow utility used for email notifications in email_backend.

airflow.utils.email.send_email_smtp

v2.0.2

smtp.smtp_host

The name of the outbound server used for the email address in smtp_host.

localhost

v2.0.2

smtp.smtp_starttls

Transport Layer Security (TLS) is used to encrypt the email over the Internet in smtp_starttls.

False

v2.0.2

smtp.smtp_ssl

Secure Sockets Layer (SSL) is used to connect the server and email client in smtp_ssl.

True

v2.0.2

smtp.smtp_port

The Transmission Control Protocol (TCP) port designated to the server in smtp_port.

587

v2.0.2

smtp.smtp_mail_from

The outbound email address in smtp_mail_from.

myemail@domain.com

Airflow v1.10.12
Airflow version Airflow configuration option Description Example value

v1.10.12

email.email_backend

The Apache Airflow utility used for email notifications in email_backend.

airflow.utils.email.send_email_smtp

v1.10.12

smtp.smtp_host

The name of the outbound server used for the email address in smtp_host.

localhost

v1.10.12

smtp.smtp_starttls

Transport Layer Security (TLS) is used to encrypt the email over the Internet in smtp_starttls.

False

v1.10.12

smtp.smtp_ssl

Secure Sockets Layer (SSL) is used to connect the server and email client in smtp_ssl.

True

v1.10.12

smtp.smtp_port

The Transmission Control Protocol (TCP) port designated to the server in smtp_port.

587

v1.10.12

smtp.smtp_mail_from

The outbound email address in smtp_mail_from.

myemail@domain.com

Task configurations

The following list shows the configurations available in the dropdown list for Airflow tasks on Amazon MWAA.

Airflow v2.0.2
Airflow version Airflow configuration option Description Example value

v2.0.2

core.default_task_retries

The number of times to retry an Apache Airflow task in default_task_retries.

3

v2.0.2

core.parallelism

The maximum number of task instances that can run simultaneously across the entire environment in parallel (parallelism).

40

Airflow v1.10.12
Airflow version Airflow configuration option Description Example value

v1.10.12

core.default_task_retries

The number of times to retry an Apache Airflow task in default_task_retries.

3

v1.10.12

core.parallelism

The maximum number of task instances that can run simultaneously across the entire environment in parallel (parallelism).

40

Scheduler configurations

The following list shows the Airflow scheduler configurations available in the dropdown list on Amazon MWAA.

Airflow v2.0.2
Airflow version Airflow configuration option Description Example value

v2.0.2

scheduler.catchup_by_default

Tells the scheduler to create a DAG run to "catch up" to the specific time interval in catchup_by_default.

False

v2.0.2

scheduler.scheduler_zombie_task_threshold

Tells the scheduler whether to mark the task instance as failed and reschedule the task in scheduler_zombie_task_threshold.

300

Airflow v1.10.12
Airflow version Airflow configuration option Description Example value

v1.10.12

scheduler.catchup_by_default

Tells the scheduler to create a DAG run to "catch up" to the specific time interval in catchup_by_default.

False

v1.10.12

scheduler.scheduler_zombie_task_threshold

Tells the scheduler whether to mark the task instance as failed and reschedule the task in scheduler_zombie_task_threshold.

300

Worker configurations

The following list shows the Airflow worker configurations available in the dropdown list on Amazon MWAA.

Airflow v2.0.2
Airflow version Airflow configuration option Description Example value

v2.0.2

celery.worker_autoscale

The maximum and minimum number of tasks that can run concurrently on any worker using the Celery Executor in worker_autoscale. Value must be comma-separated in the following order: max_concurrency,min_concurrency.

16,12

Airflow v1.10.12
Airflow version Airflow configuration option Description Example value

v1.10.12

celery.worker_autoscale

The maximum and minimum number of tasks that can run concurrently on any worker using the Celery Executor in worker_autoscale. Value must be comma-separated in the following order: max_concurrency,min_concurrency.

16,12

Web server configurations

The following list shows the Airflow web server configurations available in the dropdown list on Amazon MWAA.

Airflow v2.0.2
Airflow version Airflow configuration option Description Example value

v2.0.2

webserver.default_ui_timezone

The default Apache Airflow UI datetime setting in default_ui_timezone.

Note

Setting the default_ui_timezone option does not change the time zone in which your DAGs are scheduled to run. To change the time zone for your DAGs, you can use a custom plugin. For more information, see Custom plugin to change the DAG schedule timezone.

America/New_York

Airflow v1.10.12
Airflow version Airflow configuration option Description Example value

v1.10.12

webserver.default_ui_timezone

The default Apache Airflow UI datetime setting in default_ui_timezone.

America/New_York

Examples and sample code

Example DAG

You can use the following DAG to print your email_backend Apache Airflow configuration options. To run in response to Amazon MWAA events, copy the code to your environment's DAGs folder on your Amazon S3 storage bucket.

def print_var(**kwargs): email_backend = kwargs['conf'].get(section='email', key='email_backend') print("email_backend") return email_backend with DAG(dag_id="email_backend_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag: email_backend_test = PythonOperator( task_id="email_backend_test", python_callable=print_var, provide_context=True )

Example email notification settings

The following Apache Airflow configuration options can be used for a Gmail.com email account using an app password. For more information, see Sign in using app passwords in the Gmail Help reference guide.


          This image shows how to configure a gmail.com email account using Apache Airflow configuration options on the MWAA console.

What's next?