AWS managed migration tools
AWS has designed several sophisticated services to help with cloud data migration.
AWS Direct Connect
AWS Direct Connect
Using AWS Direct Connect, you can easily establish a dedicated
network connection from your premises to AWS at speeds starting
at 50 Mbps and up to 100 Gbps. You can use the connection to
access Amazon Virtual Private Cloud
AWS Direct Connect in itself is not a data transfer service.
Rather, AWS Direct Connect provides a high bandwidth connection
that can be used to transfer data between your corporate network
and AWS with more consistent performance and without ever having
the data routed over the Internet. Encryption methods may be
applied to secure the data transfers over the AWS Direct Connect
such as
AWS Site-to-Site VPN
AWS
APN
Partners
With 108
Direct
Connect locations
With AWS Direct Connect, you only pay for what you use, and
there is no minimum fee associated with using the service. AWS Direct Connect has two pricing components: port-hour rate (based
on port speed), and data transfer out (per GB per month).
Additionally, if you are using an APN partner to facilitate an
AWS Direct Connect connection, contact the partner to discuss
any fees they may charge. For information about pricing, see
Amazon
Direct Connect Pricing
AWS Snow Family
The AWS Snow
Family
With AWS Snowball, you have the choice of two devices as of the date of this writing, Snowball Edge Compute Optimized with more computing capabilities, suited for higher performance workloads, or Snowball Edge Storage Optimized with more storage, which is suited for large-scale data migrations and capacity-oriented workloads.
Snowball Edge Compute Optimized provides powerful computing resources for use cases such as machine learning, full motion video analysis, analytics, and local computing stacks. These capabilities include 52 vCPUs, 208 GiB of memory, and an optional NVIDIA Tesla V100 GPU. For storage, the device provides 42 TB usable HDD capacity for S3 compatible object storage or EBS-compatible block volumes, as well as 7.68 TB of usable NVMe SSD capacity for EBS-compatible block volumes. Snowball Edge Compute Optimized devices run Amazon EC2 sbe-c and sbe-g instances, which are equivalent to C5, M5a, G3, and P3 instances.
Snowball Edge Storage Optimized devices are well suited for large-scale data migrations and recurring transfer workflows, as well as local computing with higher capacity needs. Snowball Edge Storage Optimized provides 80 TB of HDD capacity for block volumes and Amazon S3-compatible object storage, and 1 TB of SATA SSD for block volumes. For computing resources, the device provides 40 vCPUs, and 80 GiB of memory to support Amazon EC2 sbe1 instances (equivalent to C5).
AWS transfers your data directly onto Snowball Edge device using on-premises high-speed connections, ships the device to AWS facilities, and transfers data off of AWS Snowball Edge devices using Amazon’s high-speed internal network. The data transfer process bypasses the corporate Internet connection and mitigates the requirement for an AWS Direct Connect services. For datasets of significant size, AWS Snowball is often faster than transferring data via the Internet and more cost-effective than upgrading your data center’s Internet connection. AWS Snowball supports importing data into and exporting data from Amazon S3 buckets. From there, the data can be copied or moved to other AWS services such as Amazon Elastic Block Store (Amazon EBS), Amazon Elastic File System (Amazon EFS), Amazon FSx File Gateway, and Amazon Glacier.
AWS Snowball is ideal for transferring large amounts of data, up to many petabytes, in and out of the AWS cloud securely. This approach is effective, especially in cases where you don’t want to make expensive upgrades to your network infrastructure; if you frequently experience large backlogs of data; if you are in a physically isolated environment; or if you are in an area where high-speed Internet connections are not available or cost-prohibitive. In general, if loading your data over the Internet would take a week or more, you should consider using AWS Snow Family.
Common use cases include cloud migration, disaster recovery, data center decommission, and content distribution. When you decommission a data center, many steps are involved to make sure valuable data is not lost, and the AWS Snow Family can help ensure data is securely and cost-effectively transferred to AWS. In a content distribution scenario, you might use Snowball Edge devices if you regularly receive or need to share large amounts of data with clients, customers, or business partners. Snowball appliances can be sent directly from AWS to client or customer locations.
If you need to move massive amounts of data, AWS Snowmobile is an Exabyte-scale data transfer service. Each Snowmobile is a 45-foot long ruggedized shipping container hauled by a trailer truck with up to 100 PB data storage capacity. Snowmobile also handles all of the logistics. AWS personnel transport and configure the Snowmobile. They will also work with your team to connect a temporary high-speed network switch to your local network. The local high-speed network facilitates rapid transfer of data from within your datacenter to the Snowmobile. Once you’ve loaded all your data, the Snowmobile drives back to AWS where the data is imported into Amazon S3.
Moving data at this massive scale requires additional preparation, precautions, and security. Snowmobile uses GPS tracking, round the clock video surveillance, and dedicated security personnel. Snowmobile offers an optional security escort vehicle while your data is in transit to AWS. Management of and access to the shipping container and data stored within is limited to AWS personnel using hardware secure access control methods.
AWS Snow Family might not be the ideal solution if your data can be transferred over the Internet in less than one week, or if your applications cannot tolerate the offline transfer time.
With AWS Snow Family, as with most other AWS services, you pay
only for what you use. Snowball has three pricing components: a
service fee (per job), extra day charges as required, and data
transfer out. The first 5 days of Snowcone usage and the first
10 days of onsite Snowball includes 10 days of device use. For
the destination storage, the standard Amazon S3 storage pricing
applies. For pricing information, see
AWS Snowball pricing
AWS Storage Gateway
AWS Storage Gateway
You can download the AWS Storage Gateway software appliance as a virtual machine (VM) image that you install on a host in your data center or as an EC2 instance. After you’ve installed your gateway and associated it with your AWS account through the AWS activation process, you can use the AWS Management Console to create gateway-cached volumes, gateway-stored volumes, or a gateway–virtual tape library (VTL), each of which can be mounted as an iSCSI device by your on-premises applications.
Volume Gateway supports iSCSI connections that enable storing of volume data in S3. With caching enabled, you can use Amazon S3 to hold your complete set of data, while caching some portion of it locally for on-premises frequently accessed data. Gateway-cached volumes minimize the need to scale your on-premises storage infrastructure, while still providing your applications with low-latency access to frequently accessed data. You can create storage volumes up to 32 TiB in size and mount them as iSCSI devices from your on-premises application servers. Each gateway configured for gateway-cached volumes can support up to 32 volumes and total volume storage per gateway of 1,024 TiB. Data written to these volumes is stored in Amazon S3, with only a cache of recently written and recently read data stored locally on your on-premises storage hardware.
Gateway-stored volumes store your locally sourced data in cache, while asynchronously backing up data to AWS. These volumes provide your on-premises applications with low-latency access to their entire datasets, while providing durable, off-site backups. You can create storage volumes up to 16 TiB in size and mount them as iSCSI devices from your on-premises application servers. Each gateway configured for gateway-stored volumes can support up to 32 volumes, with a total volume storage of 512 TiB. Data written to your gateway-stored volumes is stored on your on-premises storage hardware, and asynchronously backed up to Amazon S3 in the form of Amazon EBS snapshots.
A gateway-VTL allows you to perform offline data archiving by presenting your existing backup application with an iSCSI-based VTL consisting of a virtual media changer and virtual tape drives. You can create virtual tapes in your VTL by using the AWS Management Console, and you can size each virtual tape from 100 GiB to 5 TiB. A VTL can hold up to 1,500 virtual tapes, with a maximum aggregate capacity of 1 PiB. After the virtual tapes are created, your backup application can discover them using its standard media inventory procedure. Once created, tapes are available for immediate access and are stored in Amazon S3.
Virtual tapes you need to access frequently should be stored in a VTL. Data that you don't need to retrieve frequently can be archived to your virtual tape shelf (VTS), which is stored in Amazon Glacier, further reducing your storage costs.
Organizations are using AWS Storage Gateway to support a number of use cases. These use cases include corporate file sharing, enabling existing on-premises backup applications to store primary backups on Amazon S3, disaster recovery, and mirroring data to cloud-based compute resources and then later archiving the data to Amazon Glacier.
With AWS Storage Gateway, you pay only for what you use. AWS Storage Gateway has the following pricing components: gateway
usage (per gateway appliance per month), and data transfer out
(per GB per month). Based on type of gateway appliance you use
there are snapshot storage usage (per GB per month), and volume
storage usage (per GB per month) for gateway-cached
volumes/gateway-stored volumes, and virtual tape shelf storage
(per GB per month), virtual tape library storage (per GB per
month), and retrieval from virtual tape shelf (per GB) for
Gateway-Virtual Tape Library. For information about pricing, see
AWS Storage Gateway pricing
Amazon S3 Transfer Acceleration (S3TA)
Amazon S3 Transfer Acceleration (S3TA) enables fast, easy, and secure transfers of files over long distances between your client and your Amazon S3 bucket. Transfer Acceleration leverages Amazon CloudFront globally distributed AWS edge locations. As data arrives at an AWS edge location, data is routed to your Amazon S3 bucket over an optimized network path.
Transfer Acceleration helps you fully utilize your bandwidth,
minimize the effect of distance on throughput, and ensure
consistently fast data transfer to Amazon S3 regardless of your
client’s location. Acceleration primarily depends on your
available bandwidth, the distance between the source and
destination, and packet loss rates on the network path.
Generally, you will see more acceleration when the source is
farther from the destination, when there is more available
bandwidth, and/or when the object size is bigger. You can use
the online
speed
comparison tool
Organizations are using Transfer Acceleration on a bucket for a variety of reasons. For example, they have customers that upload to a centralized bucket from all over the world, transferring gigabytes to terabytes of data on a regular basis across continents, or having underutilized the available bandwidth over the Internet when uploading to Amazon S3. The best part about using Transfer Acceleration on a bucket is that the feature can be enabled by a single click of a button in the Amazon S3 console; this makes the accelerate endpoint available to use in place of the regular Amazon S3 endpoint.
With Transfer Acceleration, you pay only for what you use and
for transferring data over the accelerated endpoint. Transfer
Acceleration has the following pricing components: data transfer
in (per GB), data transfer out (per GB), and data transfer
between Amazon S3 and another AWS Region (per GB). Transfer
acceleration pricing is in addition to data transfer (per GB per
month) pricing for Amazon S3. For information about pricing, see
Amazon S3
pricing
AWS Firehose
Amazon Data Firehose
You can use Data Firehose by creating a delivery stream and sending the data to it. The streaming data originators are called data producers. A producer can be as simple as a PutRecord() or PutRecordBatch() API call, or you can build your producers using Kinesis Agent. You can send a record (before base64-encoding) as large as 1000 KiB. Additionally, Firehose buffers incoming streaming data to a certain size called a Buffer Size (1 MiB to 128 MiB) or for a certain period of time called a Buffer Interval (60 to 900 seconds) before delivering to destinations.
With Amazon Data Firehose, you pay only for the volume
of data you transmit through the service. Amazon Data Firehose has a single pricing component: data ingested (per
GiB), which is calculated as the number of data records you send
to the service, times the size of each record rounded up to the
nearest 5 KiB. There may be charges associated with PUT requests
and storage on Amazon S3 and Amazon Redshift, and Amazon OpenSearch Service instance hours based on the destination you select
for loading data. For information about pricing see,
Amazon Data Firehose pricing
AWS Transfer Family
If you are looking to modernize your file transfer workflows for business processes that are heavily dependent on FTP, SFTP, and FTPS; the AWS Transfer Family service provides fully managed file transfers in and out of Amazon S3 buckets and Amazon EFS shares. The AWS Transfer Family uses a highly available multi-AZ architecture that automatically scales to add capacity based on your file transfer demand. This means no more FTP, SFTP, and FTPS servers to manage. The AWS Transfer Family allows the authentication of users through multiple methods including self-managed, AWS Directory Service, on-premises Active Directory systems through AWS Managed Microsoft AD connectors, or custom identity providers. Custom identity providers may be configured through the Amazon API Gateway enabling custom configurations. DNS entries used by existing users, partners, and applications are maintained using Route 53 for minimal disruption and seamless migration. With your data residing in Amazon S3 or Amazon EFS, you can use other AWS services for analytics and data processing workflows.
There are many use cases that require a standards-based file transfer protocol like FTP, SFTP, or FTPS. AWS Transfer Family is a good fit for secure file sharing between an organization and third parties. Examples of data that are shared between organizations are large files such as audio/video media files, technical documents, research data, and EDI data such as purchase orders and invoices. Another use case is providing a central location where users can download and globally access your data securely. A third use case is to facilitate data ingestion for a data lake. Organizations and third parties can FTP, SFTP, or FTPS research, analytics, or business data into an Amazon S3 bucket, which can then be further processed and analyzed.
With the AWS Transfer Family, you only pay for the protocols you
have enabled for access to your endpoint, and the amount of data
transferred over each of the protocols. There are no upfront
costs and no resources to manage yourself. You select the
protocols, identity provider, and endpoint configuration to
enable transfers over the chosen protocols. You are billed on an
hourly basis for each of the protocols enabled to access your
endpoint, until the time you delete it. You are also billed
based on the amount of data (Gigabytes) uploaded and downloaded
over each of the protocols. For more details on pricing per
region, see
AWS Transfer Family pricing
Third-Party Connectors
Many of the most popular third-party backup software packages, such as CommVault Simpana and Veritas NetBackup, include Amazon S3 connectors. This allows the backup software to point directly to the cloud as a target while still keeping the backup job catalog complete. Existing backup jobs can simply be rerouted to an Amazon S3 target bucket, and the incremental daily changes are passed over the Internet. Lifecycle management policies can move data from Amazon S3 into lower-cost storage tiers for archival status or deletion. Eventually, and invisibly, local tape and disk copies can be aged out of circulation and tape and tape automation costs can be entirely removed.
These connectors can be used alone, or they can be used with a gateway provided by AWS Storage Gateway to back up to the cloud without affecting or re-architecting existing on-premises processes. Backup administrators will appreciate the integration into their daily console activities, and cloud architects will appreciate the behind-the-scenes job migration into Amazon S3.