Customizing Apache Airflow configurations - Amazon Managed Workflows for Apache Airflow

Customizing Apache Airflow configurations

The first time you run Apache Airflow, it creates an airflow.cfg configuration file in your $AIRFLOW_HOME directory and attaches the configurations to your environment as environment variables. This guide describes the Apache Airflow configuration options on the Amazon Managed Workflows for Apache Airflow (MWAA) console and how to use these options to override Apache Airflow configuration settings in your environment.

Prerequisites

How it works

The first time you run Apache Airflow, it creates an airflow.cfg configuration file in your $AIRFLOW_HOME directory and attaches the configurations to your environment as environment variables.

The universal order of precedence for all Apache Airflow configuration options is as follows:

  1. As an environment variable: AIRFLOW__CORE__SQL_ALCHEMY_CONN

  2. As a command environment variable: AIRFLOW__CORE__SQL_ALCHEMY_CONN_CMD

  3. As a secret environment variable: AIRFLOW__CORE__SQL_ALCHEMY_CONN_SECRET

  4. As a setting in: airflow.cfg

  5. As a command in: airflow.cfg

  6. As a secret key in: airflow.cfg

  7. Using Apache Airflow’s built in defaults

While we don't expose the airflow.cfg in the Apache Airflow UI of an environment, you can change the default Apache Airflow configuration options directly on the Amazon MWAA console and continue using all other settings in airflow.cfg.

Using Apache Airflow configuration options on the console

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose the environment you created.

  3. Choose Edit.

  4. Choose Next.

  5. Choose Add custom configuration in the Airflow configuration options pane.

  6. Choose an Apache Airflow configuration from the drop-down list and enter a value.

Configuration sample code

The following DAG contains a PythonOperator which prints email_backend configuration options. In order for it to run in response to Amazon MWAA events, copy it into your environment's DAGs folder on your Amazon S3 storage bucket. For more information, see Working with DAGs.

def print_var(**kwargs): email_backend = kwargs['conf'].get(section='email', key='email_backend') print("email_backend") return email_backend with DAG(dag_id="email_backend_dag", schedule_interval="@once", default_args=default_args, catchup=False) as dag: email_backend_test = PythonOperator( task_id="email_backend_test", python_callable=print_var, provide_context=True )

Configuration reference

The following section contains the list of available Apache Airflow configurations on the Amazon MWAA console.

Email notifications

The following list shows the email notification configuration options available on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

email_backend

email.email_backend

The Apache Airflow utility used for email notifications in email_backend.

airflow.utils.email.send_email_smtp

smtp_host

smtp.smtp_host

The name of the outbound server used for the email address in smtp_host.

localhost

smtp_starttls

smtp.smtp_starttls

Transport Layer Security (TLS) is used to encrypt the email over the Internet in smtp_starttls.

False

smtp_ssl

smtp.smtp_ssl

Secure Sockets Layer (SSL) is used to connect the server and email client in smtp_ssl.

True

smtp_port

smtp.smtp_port

The Transmission Control Protocol (TCP) port designated to the server in smtp_port.

25

smtp_mail_from

smtp.smtp_mail_from

The outbound email address in smtp_mail_from.

myemail@domain.com

Task configurations

The following list shows the configurations available for tasks on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

default_task_retries

core.default_task_retries

The number of times to retry an Apache Airflow task in default_task_retries.

3

parallelism

core.parallelism

The maximum number of task instances that should run simultaneously in parallelism.

40

Scheduler configurations

The following list shows the scheduler configurations available on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

catchup_by_default

scheduler.catchup_by_default

Tells the scheduler to create a DAG run to "catch up" to the specific time interval in catchup_by_default.

False

scheduler_zombie_task_threshold

scheduler.scheduler_zombie_task_threshold

Tells the scheduler whether to mark the task instance as failed and reschedule the task in scheduler_zombie_task_threshold.

300

Worker configurations

The following list shows the configurations available for workers on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

worker_concurrency

core.worker_concurrency

The number of task instances that a worker runs concurrently in worker_concurrency.

20

System settings

The following list shows the configurations available for Apache Airflow system settings on the Amazon MWAA console.

Amazon MWAA UI selection Apache Airflow configuration option Description Example value

default_ui_timezone

core.default_ui_timezone

The default Apache Airflow UI datetime setting in default_ui_timezone.

America/New_York