Deleting orphan files - AWS Glue

Deleting orphan files

AWS Glue Data Catalog allows you to remove orphan files from your Iceberg tables. Orphan files are data or metadata files that are no longer tracked by the Iceberg table metadata, but still exist in the Amazon S3 data source. These orphan files can accumulate over time due to operations like compaction, partition drops, or table rewrites, and take up unnecessary storage space.

The orphan file deletion optimizer in AWS Glue scans the table metadata and the actual data files, identifies the orphan files, and deletes them to reclaim storage space.

You can initiate the orphan file deletion by creating an orphan file deletion table optimizer in the Data Catalog.