AWS Storage Gateway
User Guide (API Version 2012-06-30)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

How AWS Storage Gateway Works

AWS Storage Gateway service architecture enables integration between your organization's on-premises IT environment and AWS's storage infrastructure. AWS Storage Gateway provides the following two storage options to enable this integration.

  • Gateway-cached volumes enable you to utilize Amazon S3 as your primary data storage while retaining frequently accessed data local in your AWS Storage Gateway. Gateway-cached volumes minimize the need to scale your on-premises storage infrastructure, while still providing your applications with low-latency access to their frequently accessed data. You can create storage volumes up to 32 TiB in size and attach to them as iSCSI devices from your on-premises application servers. Data written to these volumes is stored in Amazon S3 and retained along with recently read data in your on-premises AWS Storage Gateway's cache and upload buffer storage.

    Gateway-cached volumes can range from 1 GiB to 32 TiB in size and must be rounded to the nearest GiB. Each gateway configured for gateway-cached volumes can support up to 20 volumes and a total volume storage of 150 TiB.

  • Gateway-stored volumes enable you to store your primary data locally, while asynchronously backing up that data to AWS. Gateway-stored volumes provide your on-premises applications with low-latency access to their entire data sets, while providing durable, off-site backups. You can create storage volumes up to 1 TiB in size and mount them as iSCSI devices from your on-premises application servers. Data written to your gateway-stored volumes is stored on your on-premises storage hardware, and asynchronously backed up to Amazon S3 in the form of Amazon EBS snapshots.

    Gateway-stored volumes can range from 1 GiB to 1 TiB in size and must be rounded to the nearest GiB. Each gateway configured for gateway-stored volumes can support up to 12 volumes and a total volume storage of 12 TiB.

In both cases, AWS Storage Gateway takes snapshots, makes incremental backups, and stores them in AWS.

AWS Storage Gateway: Gateway-Cached Volume Architecture

In the gateway-cached volume solution, AWS Storage Gateway stores all your on-premises application data in a storage volume in Amazon S3.

The following diagram provides an overview of the AWS Storage Gateway's cached volume deployment.

Once you've installed AWS Storage Gateway's software appliance (the virtual machine (VM)) on a host in your data center and activated it, you can use the AWS Management Console to provision storage volumes backed by Amazon S3. You can also provision storage volumes programmatically using the AWS Storage Gateway API or the AWS SDK libraries. You then mount these storage volumes to your on-premises application servers as iSCSI devices.

You also allocate disks on-premises for the VM. These on-premises disks serve the following purposes:

  • Disks for use by the gateway as cache storage—As your applications write data to the storage volumes in AWS, the gateway initially stores the data on the on-premises disks referred to as cache storage before uploading it to Amazon S3. The cache storage acts as the on-premises durable store for data that is pending upload to Amazon S3 from the upload buffer.

    The cache storage also enables the gateway to store your application's recently accessed data on-premises for low-latency access. If your application requests data, the gateway first checks the cache storage for the data before checking Amazon S3.

    There are some rules to the amount of disk space you can allocate for the cache storage. As a general rule, you should allocate at least 20 percent of your existing file store size; however, cache storage should be larger than the upload buffer. This ensures cache storage is large enough to be able to persistently hold all data that is in the upload buffer that has not yet been uploaded to Amazon S3.

  • Disks for use by the gateway as the upload buffer—To prepare for upload to Amazon S3, your gateway also stores incoming data in a staging area, referred to as an upload buffer. Your gateway uploads this buffer data over an encrypted SSL connection to AWS where it is stored encrypted in Amazon S3.

You can take incremental backups, called snapshots, of your storage volumes in Amazon S3. These point-in-time snapshots are also stored in Amazon S3 as Amazon EBS snapshots. When you take a new snapshot, only the data that has changed since your last snapshot is stored. You can initiate snapshots on a scheduled or ad-hoc basis. When you delete a snapshot, only the data not needed for any other snapshots is removed.

You can restore an Amazon EBS snapshot to a gateway storage volume in the event you need to recover a backup of your data.  We plan to add support for Amazon EC2 deployable gateways in the near future, enabling you to restore your snapshot to an Amazon EC2 gateway storage volume.  Alternatively, for snapshots up to 1 TiB in size, you can use the snapshot as a starting point for a new Amazon EBS volume, which you can then attach to an Amazon EC2 instance.

All gateway-cached volume data and snapshot data is stored in Amazon S3 encrypted at rest using Server Side Encryption (SSE). However, you cannot access this data using Amazon S3 APIs or with other tools such as the Amazon S3 console.

AWS Storage Gateway: Gateway-Stored Volume Architecture

In the gateway-stored volume solution, you maintain your volume storage on-premises in your data center. That is, you store all your application data on your on-premises storage hardware. The gateway then securely uploads data to the AWS cloud for cost-effective backup and rapid disaster recovery. This is an ideal solution if you want to keep data locally on-premises because you need low-latency access to all your data and maintain backups in AWS.

The following diagram provides an overview of the AWS Storage Gateway's stored volume deployment

Once you've installed AWS Storage Gateway's software appliance (the virtual machine (VM)) on a host in your data center and activated it, you can create gateway storage volumes and map them to on-premises Direct Attached Storage (DAS) or Storage Area Network (SAN) disks. You can start with either new disks or disks already holding data. You can then mount these storage volumes to your on-premises application servers as iSCSI devices. As your on-premises applications write data to and read data from a gateway's storage volume, this data is stored and retrieved from the volume's assigned disk.

To prepare data for upload to Amazon S3, your gateway also stores incoming data in a staging area, referred to as an upload buffer. You can use on-premises DAS or SAN disks for working storage. Your gateway uploads data from the upload buffer over an encrypted SSL connection to the AWS Storage Gateway service running in the AWS cloud. The service then stores the data encrypted in Amazon S3.

You can take incremental backups, called snapshots, of your storage volumes. The gateway stores these snapshots in Amazon S3 as Amazon EBS snapshots. When taking a new snapshot, only the data that has changed since your last snapshot is stored. You can initiate snapshots on a scheduled or ad-hoc basis. When you delete a snapshot, only the data not needed for any other snapshot is removed.

You can restore an Amazon EBS snapshot to an on-premises gateway storage volume in the event that you need to recover a backup of your data. You can also use the snapshot as a starting point for a new Amazon EBS volume, which you can then attach to an Amazon EC2 instance.