VACUUM - Amazon Athena

VACUUM

The VACUUM statement performs table maintenance on Apache Iceberg tables by removing no longer needed data files.

Note

VACUUM is transactional and is supported only for Apache Iceberg tables in Athena engine version 3.

Running the VACUUM statement on Iceberg tables is recommended to remove data files that are no longer relevant and to reduce metadata size and storage consumption. Note that, because the VACUUM statement makes API calls to Amazon S3, charges apply for the associated requests to Amazon S3.

Warning

If you run a snapshot expiration operation, you can no longer time travel to expired snapshots.

Synopsis

To remove data files no longer needed for an Iceberg table, use the following syntax.

VACUUM [database_name.]target_table

To run VACUUM on a table with a name that begins with an underscore (for example, _mytable), enclose the table name in backticks, as in the following example. If you prefix the table name with a database name, do not enclose the database name in backticks. Note that double quotes will not work in place of backticks.

This behavior is particular to VACUUM. The CREATE and INSERT INTO statements do not require backticks for table names that begin with underscores.

VACUUM `_mytable` VACUUM my_database.`_mytable`

Note also that VACUUM expects the Iceberg data to be in an Amazon S3 folder rather than an Amazon S3 bucket. For example, if your Iceberg data is at s3://DOC-EXAMPLE-BUCKET/ instead of s3://DOC-EXAMPLE-BUCKET/myicebergfolder/, the VACUUM statement fails with the error message GENERIC_INTERNAL_ERROR: Path missing in file system location: s3://DOC-EXAMPLE-BUCKET.

Operations performed

VACUUM performs the following operations:

  • Removes snapshots that are older than the amount of time that is specified by the vacuum_max_snapshot_age_seconds table property. By default, this property is set to 432000 seconds (5 days).

  • Removes snapshots that are not within the period to be retained that are in excess of the number specified by the vacuum_min_snapshots_to_keep table property. The default is 1.

    You can specify these table properties in your CREATE TABLE statement. After the table has been created, you can use the ALTER TABLE SET PROPERTIES statement to update them.

  • Removes any metadata and data files that are unreachable as a result of the snapshot removal. You can configure the number of old metadata files to be retained by setting the vacuum_max_metadata_files_to_keep table property. The default value is 100.

  • Removes orphan files that are older than the time specified in the vacuum_max_snapshot_age_seconds table property. Orphan files are files in the table's data directory that are not part of the table state.

For more information about creating and managing Apache Iceberg tables in Athena, see Creating Iceberg tables and Managing Iceberg tables.