Deleting files on Amazon S3 - Amazon Managed Workflows for Apache Airflow

Deleting files on Amazon S3

This page describes how versioning works in an Amazon S3 bucket for an Amazon Managed Workflows for Apache Airflow environment, and the steps to delete a DAG, plugins.zip, or requirements.txt file.

Prerequisites

You'll need the following before you can complete the steps on this page.

  • Permissions — Your AWS account must have been granted access by your administrator to the AmazonMWAAFullConsoleAccess access control policy for your environment. In addition, your Amazon MWAA environment must be permitted by your execution role to access the AWS resources used by your environment.

  • Access — If you require access to public repositories to install dependencies directly on the web server, your environment must be configured with public network web server access. For more information, see Apache Airflow access modes.

  • Amazon S3 configuration — The Amazon S3 bucket used to store your DAGs, custom plugins in plugins.zip, and Python dependencies in requirements.txt must be configured with Public Access Blocked and Versioning Enabled.

Versioning overview

The requirements.txt and plugins.zip in your Amazon S3 bucket are versioned. When Amazon S3 bucket versioning is enabled for an object, and an artifact (for example, plugins.zip) is deleted from an Amazon S3 bucket, the file doesn't get deleted entirely. Anytime an artifact is deleted on Amazon S3, a new copy of the file is created that is a 404 (Object not found) error/0k file that says "I'm not here." Amazon S3 calls this a delete marker. A delete marker is a "null" version of the file with a key name (or key) and version ID like any other object.

We recommend deleting file versions and delete markers periodically to reduce storage costs for your Amazon S3 bucket. To delete "non-current" (previous) file versions entirely, you must delete the versions of the file(s), and then the delete marker for the version.

How it works

Amazon MWAA runs a sync operation on your Amazon S3 bucket every thirty seconds. This causes any DAG deletions in an Amazon S3 bucket to be synced to the Airflow image of your Fargate container.

For plugins.zip and requirements.txt files, changes occur only after an environment update when Amazon MWAA builds a new Airflow image of your Fargate container with the custom plugins and Python dependencies. If you delete the current version of any of a requirements.txt or plugins.zip file, and then update your environment without providing a new version for the deleted file, then the update will fail with an error message, such as, "Unable to read version {version} of file {file}".

Deleting a DAG on Amazon S3

A DAG file (.py) is not versioned and can be deleted directly on the Amazon S3 console. The following steps describe how to delete a DAG on your Amazon S3 bucket.

To delete a DAG
  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Select the S3 bucket link in the DAG code in S3 pane to open your storage bucket on the Amazon S3 console.

  4. Choose the dags folder.

  5. Select the DAG, Delete.

  6. Under Delete objects?, type delete.

  7. Choose Delete objects.

Note

Apache Airflow preserves historical DAG runs. After a DAG has been run in Apache Airflow, it remains in the Airflow DAGs list regardless of the file status, until you delete it in Apache Airflow. To delete a DAG in Apache Airflow, choose the red "delete" button under the Links column.

Removing a "current" requirements.txt or plugins.zip from an environment

Currently, there isn't a way to remove a plugins.zip or requirements.txt from an environment after they’ve been added, but we're working on the issue. In the interim, a workaround is to point to an empty text or zip file, respectively.

Deleting a "non-current" (previous) requirements.txt or plugins.zip version

The requirements.txt and plugins.zip files in your Amazon S3 bucket are versioned on Amazon MWAA. If you want to delete these files on your Amazon S3 bucket entirely, you must retrieve the current version (121212) of the object (for example, plugins.zip), delete the version, and then remove the delete marker for the file version(s).

You can also delete "non-current" (previous) file versions on the Amazon S3 console; however, you'll still need to delete the delete marker using one of the following options.

Using lifecycles to delete "non-current" (previous) versions and delete markers automatically

You can configure a lifecycle policy for your Amazon S3 bucket to delete "non-current" (previous) versions of the plugins.zip and requirements.txt files in your Amazon S3 bucket after a certain number of days, or to remove an expired object's delete marker.

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Under DAG code in Amazon S3, choose your Amazon S3 bucket.

  4. Choose Create lifecycle rule.

Example lifecycle policy to delete requirements.txt "non-current" versions and delete markers automatically

The following example shows how to create a lifecycle rule that permanently deletes "non-current" versions of a requirements.txt file and their delete markers after thirty days.

  1. Open the Environments page on the Amazon MWAA console.

  2. Choose an environment.

  3. Under DAG code in Amazon S3, choose your Amazon S3 bucket.

  4. Choose Create lifecycle rule.

  5. In Lifecycle rule name, type Delete previous requirements.txt versions and delete markers after thirty days.

  6. In Prefix, requirements.

  7. In Lifecycle rule actions, choose Permanently delete previous versions of objects and Delete expired delete markers or incomplete multipart uploads.

  8. In Number of days after objects become previous versions, type 30.

  9. In Expired object delete markers, choose Delete expired object delete markers, objects are permanently deleted after 30 days.

What's next?