Deleting files on Amazon S3
This page describes how versioning works in an Amazon S3 bucket for an Amazon Managed Workflows for Apache Airflow environment, and the steps to delete a DAG, plugins.zip
, or requirements.txt
file.
Contents
- Prerequisites
- Versioning overview
- How it works
- Deleting a DAG on Amazon S3
- Removing a "current" requirements.txt or plugins.zip from an environment
- Deleting a "non-current" (previous) requirements.txt or plugins.zip version
- Using lifecycles to delete "non-current" (previous) versions and delete markers automatically
- Example lifecycle policy to delete requirements.txt "non-current" versions and delete markers automatically
- What's next?
Prerequisites
You'll need the following before you can complete the steps on this page.
-
Permissions — Your AWS account must have been granted access by your administrator to the AmazonMWAAFullConsoleAccess access control policy for your environment. In addition, your Amazon MWAA environment must be permitted by your execution role to access the AWS resources used by your environment.
-
Access — If you require access to public repositories to install dependencies directly on the web server, your environment must be configured with public network web server access. For more information, see Apache Airflow access modes.
-
Amazon S3 configuration — The Amazon S3 bucket used to store your DAGs, custom plugins in
plugins.zip
, and Python dependencies inrequirements.txt
must be configured with Public Access Blocked and Versioning Enabled.
Versioning overview
The requirements.txt
and plugins.zip
in your Amazon S3 bucket are versioned. When Amazon S3 bucket versioning is enabled for an object, and an artifact (for example, plugins.zip) is deleted from an Amazon S3 bucket, the file doesn't get deleted entirely. Anytime an artifact is deleted on Amazon S3, a new copy of the file is created that is a 404 (Object not found) error/0k file that says "I'm not here." Amazon S3 calls this a delete marker. A delete marker is a "null" version of the file with a key name (or key) and version ID like any other object.
We recommend deleting file versions and delete markers periodically to reduce storage costs for your Amazon S3 bucket. To delete "non-current" (previous) file versions entirely, you must delete the versions of the file(s), and then the delete marker for the version.
How it works
Amazon MWAA runs a sync operation on your Amazon S3 bucket every thirty seconds. This causes any DAG deletions in an Amazon S3 bucket to be synced to the Airflow image of your Fargate container.
For plugins.zip
and requirements.txt
files, changes occur only after an environment update when Amazon MWAA builds a new Airflow image of your Fargate container with the custom plugins and Python dependencies. If you delete the current version of any of a requirements.txt
or plugins.zip
file, and then update your environment without providing a new version for the deleted file, then the update will fail with an error message, such as, "Unable to read version {version}
of file {file}
".
Deleting a DAG on Amazon S3
A DAG file (.py
) is not versioned and can be deleted directly on the Amazon S3 console. The following steps describe how to delete a DAG on your Amazon S3 bucket.
To delete a DAG
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Select the S3 bucket link in the DAG code in S3 pane to open your storage bucket on the Amazon S3 console.
-
Choose the
dags
folder. -
Select the DAG, Delete.
-
Under Delete objects?, type
delete
. -
Choose Delete objects.
Note
Apache Airflow preserves historical DAG runs. After a DAG has been run in Apache Airflow, it remains in the Airflow DAGs list regardless of the file status, until you delete it in Apache Airflow. To delete a DAG in Apache Airflow, choose the red "delete" button under the Links column.
Removing a "current" requirements.txt or plugins.zip from an environment
Currently, there isn't a way to remove a plugins.zip or requirements.txt from an environment after they’ve been added, but we're working on the issue. In the interim, a workaround is to point to an empty text or zip file, respectively.
Deleting a "non-current" (previous) requirements.txt or plugins.zip version
The requirements.txt
and plugins.zip
files in your Amazon S3 bucket are versioned on Amazon MWAA. If you want to delete these files on your Amazon S3 bucket entirely, you must retrieve the current version (121212) of the object (for example, plugins.zip), delete the version, and then remove the delete marker for the file version(s).
You can also delete "non-current" (previous) file versions on the Amazon S3 console; however, you'll still need to delete the delete marker using one of the following options.
-
To retrieve the object version, see Retrieving object versions from a versioning-enabled bucket in the Amazon S3 guide.
-
To delete the object version, see Deleting object versions from a versioning-enabled bucket in the Amazon S3 guide.
-
To remove a delete marker, see Managing delete markers in the Amazon S3 guide.
Using lifecycles to delete "non-current" (previous) versions and delete markers automatically
You can configure a lifecycle policy for your Amazon S3 bucket to delete "non-current" (previous) versions of the plugins.zip and requirements.txt files in your Amazon S3 bucket after a certain number of days, or to remove an expired object's delete marker.
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Under DAG code in Amazon S3, choose your Amazon S3 bucket.
-
Choose Create lifecycle rule.
Example lifecycle policy to delete requirements.txt "non-current" versions and delete markers automatically
The following example shows how to create a lifecycle rule that permanently deletes "non-current" versions of a requirements.txt file and their delete markers after thirty days.
-
Open the Environments page
on the Amazon MWAA console. -
Choose an environment.
-
Under DAG code in Amazon S3, choose your Amazon S3 bucket.
-
Choose Create lifecycle rule.
-
In Lifecycle rule name, type
Delete previous requirements.txt versions and delete markers after thirty days
. -
In Prefix, requirements.
-
In Lifecycle rule actions, choose Permanently delete previous versions of objects and Delete expired delete markers or incomplete multipart uploads.
-
In Number of days after objects become previous versions, type
30
. -
In Expired object delete markers, choose Delete expired object delete markers, objects are permanently deleted after 30 days.
What's next?
-
Learn more about Amazon S3 delete markers in Managing delete markers.
-
Learn more about Amazon S3 lifecycles in Expiring objects.