Installing Python dependencies - Amazon Managed Workflows for Apache Airflow

Installing Python dependencies

An "extra" is a Python subpackage that is not included in the Apache Airflow base install (apache-airflow==1.10.12) on your Amazon Managed Workflows for Apache Airflow (MWAA) environment. It is referred to throughout this page as a Python dependency. This page describes the steps to install Apache Airflow Python dependencies on your Amazon MWAA environment using a requirements.txt file.

Prerequisites

To use the steps on this page, you'll need:

  1. The required AWS resources configured for your environment as defined in Get started with Amazon Managed Workflows for Apache Airflow (MWAA).

  2. An execution role with a permissions policy that grants Amazon MWAA access to the AWS resources used by your environment as defined in Amazon MWAA Execution role.

  3. An AWS account with access in AWS Identity and Access Management (IAM) to the Amazon S3 console, or the AWS Command Line Interface (AWS CLI) as defined in Accessing an Amazon MWAA environment.

How it works

Amazon MWAA runs pip3 install -r requirements.txt on the requirements file that you specify for your environment for each of the Apache Airflow Scheduler and Workers.

To run Python dependencies on your environment, you must do three things:

  1. Create a requirements.txt file locally.

  2. Upload the local requirements.txt to your Amazon S3 bucket.

  3. Specify the version of this file in the Requirements file field on the Amazon MWAA console.

Note

If this is the first time you're creating and uploading a requirements.txt to your Amazon S3 bucket, you'll also need to specify the path to the file on the Amazon MWAA console. You only need to complete this step once.

Python dependencies location and size limits

The Apache Airflow Scheduler and the Workers look for custom plugins during startup on the AWS-managed Fargate container for your environment at /usr/local/airflow/requirements/requirements.txt.

  • Size limit. We recommend a requirements.txt file that references libraries whose combined size is less than than 1 GB. The more libraries Amazon MWAA needs to install, the longer the startup time on an environment. Although Amazon MWAA doesn't limit the size of installed libraries explicitly, if dependencies can't be installed within ten minutes, the Fargate service will time-out and attempt to rollback the environment to a stable state.

Creating a requirements.txt

This section describes how to specify Apache Airflow packages for Apache Airflow v1.10.12, and Python libraries on the Python Package Index (PyPI).

  • Apache Airflow packages. If your Apache Airflow platform uses Apache Airflow packages, specify the package name and the Apache Airflow version in your requirements.txt. To see a list of the Apache Airflow packages for Apache Airflow v1.10.12, see Extra Packages in the Apache Airflow reference guide. For example:

    apache-airflow[package1,package2]==1.10.12

    Example for Secure Shell (SSH)

    The following example requirements.txt file installs SSH for Apache Airflow v1.10.12.

    apache-airflow[ssh]==1.10.12

    For an example of a requirements.txt with Apache Airflow v1.10.12 Backport Providers, see Amazon Managed Workflows for Apache Airflow (MWAA) and Amazon EMR on GitHub.

  • Python libraries. If your Apache Airflow platform uses standard Python libraries on the Python Package Index (PyPI), we recommend specifying a specific version (==) in your requirements.txt file to prevent future releases that may be incompatible. For example:

    library == version

    Example for Boto3

    The following example requirements.txt file installs the Boto3 library.

    boto3 == 1.17.4

    If a package is specified without a version, Amazon MWAA installs the latest version of the package from PyPi.org. This version may conflict with other packages in your requirements.txt. This also helps to prevent a future breaking update from PyPi.org from being automatically applied.

You can specify Python wheels, Apache Airflow packages, Python libraries on the Python Package Index (PyPI), or Python dependencies hosted on a private PyPi/PEP-503 Compliant Repo in your requirements.txt. To learn more, see Managing Python dependencies in requirements.txt.

Uploading requirements.txt to Amazon S3

You can use the Amazon S3 console or the AWS Command Line Interface (AWS CLI) to upload a requirements.txt file to your Amazon S3 bucket.

Using the AWS CLI

The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command-line shell. To complete the steps in this section, you need the following:

To upload using the AWS CLI

  1. Use the following command to list all of your Amazon S3 buckets.

    aws s3 ls
  2. Use the following command to list the files and folders in the Amazon S3 bucket for your environment.

    aws s3 ls s3://YOUR_S3_BUCKET_NAME
  3. The following command uploads a requirements.txt file to an Amazon S3 bucket.

    aws s3 cp requirements.txt s3://your-s3-bucket-any-name/requirements.txt

Using the Amazon S3 console

The Amazon S3 console is a web-based user interface that allows you to create and manage the resources in your Amazon S3 bucket.

To upload using the Amazon S3 console

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Select the S3 bucket link in the DAG code in S3 pane to open your storage bucket on the Amazon S3 console.

  4. Choose Upload.

  5. Choose Add file.

  6. Select the local copy of your requirements.txt, choose Upload.

Specifying the path to requirements.txt on the Amazon MWAA console (the first time)

If this is the first time you're uploading a plugins.zip to your Amazon S3 bucket, you'll also need to specify the path to the file on the Amazon MWAA console. You only need to complete this step once.

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Choose Edit.

  4. On the DAG code in Amazon S3 pane, choose Browse S3 next to the Requirements file - optional field.

  5. Select the requirements.txt file on your Amazon S3 bucket.

  6. Choose Choose.

  7. Choose Next, Update environment.

You can begin using the new packages immediately after your environment finishes updating.

Specifying the requirements.txt version on the Amazon MWAA console

You need to specify the version of your requirements.txt file on the Amazon MWAA console each time you upload a new version of your requirements.txt in your Amazon S3 bucket.

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Choose Edit.

  4. On the DAG code in Amazon S3 pane, choose a requirements.txt version in the dropdown list.

  5. Choose Next, Update environment.

You can begin using the new packages immediately after your environment finishes updating.

Viewing logs for your requirements.txt

You can view Apache Airflow logs for the Scheduler scheduling your workflows and parsing your dags folder. The following steps describe how to open the log group for the Scheduler on the Amazon MWAA console, and view Apache Airflow logs on the CloudWatch Logs console.

To view logs for a requirements.txt

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Choose the Airflow scheduler log group on the Monitoring pane.

  4. Choose the requirements_install_ip log in Log streams.

  5. You should see the list of packages that were installed on the environment at /usr/local/airflow/requirements/requirements.txt. For example:

    Collecting appdirs==1.4.4 (from -r /usr/local/airflow/requirements/requirements.txt (line 1)) Downloading https://files.pythonhosted.org/packages/3b/00/2344469e2084fb28kjdsfiuyweb47389789vxbmnbjhsdgf5463acd6cf5e3db69324/appdirs-1.4.4-py2.py3-none-any.whl Collecting astroid==2.4.2 (from -r /usr/local/airflow/requirements/requirements.txt (line 2))
  6. Review the list of packages and whether any of these encountered an error during installation. If something went wrong, you may see an error similar to the following:

    2021-03-05T14:34:42.731-07:00 No matching distribution found for LibraryName==1.0.0 (from -r /usr/local/airflow/requirements/requirements.txt (line 4)) No matching distribution found for LibraryName==1.0.0 (from -r /usr/local/airflow/requirements/requirements.txt (line 4))

Viewing changes on your Apache Airflow UI

To access your Apache Airflow UI

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Choose Open Airflow UI.

Note

You may need to ask your account administrator to add AmazonMWAAWebServerAccess permissions for your account to view your Apache Airflow UI. For more information, see Managing access.