Reading archived S3 objects with S3 Glacier storage classes
Amazon S3 Glacier classes are special storage classes with inexpensive pricing but high retrieval time. Unlike S3 Standard objects, S3 Glacier objects can’t be read as AWS Glue tables. To make the data available for analytical queries or reporting, you first restore the S3 Glacier objects. The restoration is an asynchronous process that happens over time and has a retention period. After the objects are restored, they can be copied to a different location as S3 Standard objects. Beyond the retention period, the restored objects transition back to Amazon S3 Glacier.
Using Amazon S3 Glacier Select
Similar to using Amazon S3 Select with S3 Standard, you can query S3 Glacier objects to
fetch a subset of data. This enables programmatic access of data without needing any
preprocessing such as object restore or other AWS analytical services. For an example, see
the request syntaxinitiate_job
operation to read S3
Glacier Select data.
Using S3 Batch Operations
S3 Batch Operations enables large-scale batch operations on Amazon S3 in the order of billions of objects containing exabytes of data. Amazon S3 tracks progress, sends notifications, and stores a detailed completion report of all actions, providing a fully managed, auditable, and serverless experience.
S3 Batch Operations supports the Restore operation, which initiates S3 object restore for the following storage tiers:
-
Objects archived in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes
-
Objects archived through the S3 Intelligent-Tiering storage class in the Archive Access or Deep Archive Access tiers
The batch operation can be invoked both programmatically and on the Amazon S3 console. For input, it requires a .csv manifest file that contains the list objects to restore.
You can use an Amazon S3 Inventory report as an input for the batch work. The inventory report is configured for a bucket and can be limited to objects under specific prefixes. It is an automated report and gets generated either weekly or daily in either CSV, ORC, or Parquet format.
For more information about configuring an inventory report, see the Amazon S3 documentation. For information about using Boto3 to create an S3 Batch
Operations job, see the Boto3 documentation