Backup and recovery using Amazon S3
You can use Amazon Simple Storage Service (Amazon S3) to store and retrieve any amount of data, at any time. You can use Amazon S3 as your durable store for your application data and file-level backup and restore processes. For example, you can copy your database backups from a database instance to Amazon S3 with a backup script using the AWS CLI or AWS SDKs.
AWS services use Amazon S3 for highly durable and reliable storage, as in the following examples:
-
Amazon EC2 uses Amazon S3 to store Amazon EBS snapshots for EBS volumes and for EC2 instance stores.
-
Storage Gateway integrates with Amazon S3 to provide on-premises environments with Amazon S3 backed file shares, volumes, and tape libraries.
-
Amazon RDS uses Amazon S3 for database snapshots.
Many third-party backup solutions also use Amazon S3. For example, Arcserve Unified Data Protection supports Amazon S3 for durable backup of on-premises and cloud-native servers.
You can use the Amazon S3 integrated features of these services to simplify your backup and recovery approach. At the same time, you can benefit from the high durability and availability provided by Amazon S3.
Amazon S3 stores data as objects within resources called buckets. You can store as many objects as you want in a bucket. You can write, read, and delete objects in your bucket with fine-grained access control. Single objects can be up to 5 TB in size.
Using Amazon S3 storage classes to reduce backup data storage costs
Amazon S3 offers multiple storage classes for use in on-premises, hybrid, and cloud-native architectures. All storage classes provide scalable capacity that requires no volume or media management as your backup datasets grow. The pay-for-what-you-use model and low cost per GB/month make Amazon S3 storage classes a fit for a broad range of data-protection use cases. Amazon S3 storage classes are designed for different use cases, including the following categories:
-
Frequent access storage classes for general-purpose storage of frequently accessed data (for example, configuration files, unplanned backups, daily backups). This includes the S3 Standard storage class, which is the default for all Amazon S3 objects.
-
Infrequent access storage classes for long-lived, but infrequently accessed data (for example, monthly backups). This includes the S3 Standard-IA storage class. IA stands for infrequent access.
-
S3 Glacier storage classes for extremely long-lived data that rarely needs to be accessed (for example, yearly backups). This includes S3 Glacier Deep Archive, which provides the lowest-cost storage on AWS.
For backups with unknown or changing access patterns, you can use the S3 Intelligent-Tiering storage class. S3 Intelligent-Tiering automatically transitions objects to the most cost-effective tier based on how many days ago an object was last accessed.
Note
Amazon S3 offers lifecycle policies that you can configure to manage your data throughout its lifecycle. After a policy is set, your data will be automatically migrated to the appropriate storage class without any changes to your application. For more information, see the Amazon S3 object lifecycle management documentation.
To reduce your costs for backup, use a tiered storage class approach based on your recovery time objective (RTO) and recovery point objective (RPO), as in the following example:
-
Daily backups for the past 2 weeks using S3 Standard
-
Weekly backups for the past 3 months using S3 Standard-IA
-
Quarterly backups for the past year on S3 Glacier Flexible Retrieval
-
Yearly backups for the past 5 years on S3 Glacier Deep Archive
-
Backups deleted from S3 Glacier Deep Archive after the 5-year mark
Creating standard S3 buckets for backup and archive
You can create a standard S3 bucket for backup and archive with your corporation’s backup and retention policy implemented through S3 lifecycle policies. Cost allocation tagging and reporting for AWS billing is based on the tags assigned at the bucket level. If cost allocation is important, create separate backup and archive S3 buckets for each project or business unit so that you can allocate costs accordingly.
Your backup scripts and applications can use the backup and archive S3 bucket that you
create to store point-in-time snapshots for application and workload data. You can
create a standard S3 prefix to help you organize your point-in-time data snapshots. For
example, if you create hourly backups, consider using a backup prefix such as
YYYY/MM/DD/HH/<WorkloadName>/<files...>
. By doing this, you
can quickly retrieve your point-in-time backups manually or programmatically.
Using Amazon S3 versioning to automatically maintain rollback history
You can enable S3 object versioning to maintain a history of object changes, including the ability to revert to a previous version. This is useful for configuration files and other objects that might change more frequently than your point-in-time backup schedule. It’s also useful for files that must be reverted individually.
Using Amazon S3 to back up and recover customized configuration files for AMIs
Amazon S3 with object versioning can become your system of record for your workload configuration and option files. For example, you might use a standard AWS Marketplace Amazon EC2 image that is maintained by an ISV. This image might contain software whose configuration is maintained in a number of configuration files. You can maintain your customized configuration files in Amazon S3. When your instance is launched, you can copy these configuration files to your instance as a part of your instance user data. When you apply this approach, you don’t need to customize and recreate an AMI to use an updated version.
Using Amazon S3 in your custom backup and restore process
Amazon S3 provides a general-purpose backup store that you can quickly integrate into your
existing custom backup processes. You can use the AWS CLI, AWS SDKs, and API operations
to integrate your backup and restore scripts and processes that use Amazon S3. For example,
you might have a database backup script that performs nightly database exports. You can
customize this script to copy your nightly backups to Amazon S3 for offsite storage. See the
Batch
upload files to the cloud
You can take a similar approach for exporting and backing up data for different applications based on their individual RPO. Additionally, you can use AWS Systems Manager to run your backup scripts on your managed instances. Systems Manager provides automation, access control, scheduling, logging, and notification for your individual backup processes.
Securing backup data in Amazon S3
Data security is a universal concern, and AWS takes security very seriously. Security is the foundation of every AWS service. Amazon S3 provides capabilities for access control and encryption both at rest and in transit. All Amazon S3 endpoints support SSL/TLS for encrypting data in transit. You can set up encryption for objects at rest by doing the following:
You can use AWS Identity and Access Management (IAM) to control access to S3 objects. IAM provides control over permissions for individual objects and specific prefix paths within an S3 bucket. You can audit access to S3 objects by using object-level logging with AWS CloudTrail.