Large data migration with AWS Snow Family devices
Large data migration from on-premises locations requires careful planning, orchestration, and execution to ensure that your data is successfully migrated to AWS.
We recommend that you have a data migration strategy in place before starting your migration to avoid the potential for missed deadlines, exceeding budgets and migration failures. AWS Snow services helps you to place, order, and track your large data migration projects via the Snow Family Large Data Migration Manager (LDMM) feature in the AWS Snow Family Management Console.
The topics, Planning your large transfer with Snow Family devices and Calibrating a large transfer with Snow Family devices describe a manual data migration process. You can streamline the manual steps using the Snow Family LDMM migration plan.
Topics
Planning your large transfer with Snow Family devices
We recommend that you plan and calibrate large data transfers between the AWS Snowball Edge devices that you have on site and your servers using the guidelines in the following sections.
Topics
Step 1: Understand what you're moving to the cloud
Before you create your first job using the AWS Snow Family Management Console, ensure that you assess the volume of data you need to transfer, where it is currently stored, and the destination that you want to transfer it to. For data transfers that are a petabyte in scale or larger, this administrative housekeeping makes it much easier when your Snow Family devices arrive.
If you're migrating data into the AWS Cloud for the first time, we recommend that you design a cloud migration model. Cloud migration doesn’t happen overnight. It requires a careful planning process to ensure that all systems work as expected.
When you're done with this step, you should know the total amount of data that you're going to move into the cloud.
Step 2: Calculate your target transfer rate
It's important to estimate how quickly you can transfer data to the Snow Family devices that are connected to each of your servers. This estimated speed in MB/Sec determines how fast you can transfer the data from your data source to Snowball Edge devices using your local network infrastructure.
Note
For large data transfers, we recommend using the Amazon S3 data transfer method. You must select this option when the you order devices in the AWS Snow Family Management Console.
To determine a baseline transfer rate, transfer a small subset of your data to the Snowball Edge device, or transfer a 10 GB sample file and observe the throughput.
While determining your target transfer speed, keep in mind that you can improve the throughput by tuning your environment, including network configuration, by changing the network speed, the size of the files being transferred, and the speed at which data can be read from your local servers. The Amazon S3 adapter copies data to Snow Family devices as quickly as your conditions allow.
Step 3: Determine how many Snow Family devices you need
Using the total amount of data that you plan to move into the cloud, the estimated transfer speed, and the number of days that you want to allow to move the data into AWS, determine how many Snow Family devices you need for your large-scale data migration. Depending on the device type, Snowball Edge devices have approximately 39.5 TB, 80 TB, or 210 TB of usable storage space. For example, if you want to move 300 TB of data to AWS over 10 days and you have a transfer speed of 250 MB/s, you need 4 Snowball Edge devices. With less than 40 TB of data remaining to transfer, AWS Snowcone devices (with 14TB of usable space) will be recommended.
Note
The AWS Snow Family devices LDMM provides a wizard to estimate the number of AWS Snow Family devices that can be supported concurrently. For more information, see Creating a large data migration plan with Snow Family devices.
Step 4: Create your jobs
After you know how many Snow Family devices you need, you need to create an import job for each device. Creation of multiple jobs are simplified by the Snow Family LDMM. For more information, see Placing your next job order.
Note
You can place your next job order and automatically add it to your plan directly from the Recommended job ordering schedule. For more information, see Recommended job ordering schedule.
Step 5: Separate your data into transfer segments
As a best practice for large data transfers involving multiple jobs, we recommend that you logically split your data into a number of smaller, more manageable data sets. This allows you to transfer each partition at a time, or multiple partitions in parallel. When planning your partitions, make sure that the data for the partitions combined fit on the Snow Family devices for the job. For example, you can separate your transfer into partitions in any of the following ways:
-
You can create 10 partitions of 8 TB each for a Snowball Edge.
-
For large files, each file can be an individual partition up to the 5 TB size limit for objects in Amazon S3.
-
Each partition can be a different size, and each individual partition can be made up of the same kind of data—for example, small files in one partition, compressed archives in another, large files in another partition, and so on. This approach can help you to determine your average transfer rate for different types of files.
Note
Metadata operations are performed for each file that's transferred. Regardless of a file's size, this overhead remains the same. Therefore, you get faster performance by compressing small files into a larger bundle, batching your files, or transferring larger individual files.
Creating data transfer segments can make it easier for you to quickly resolve transfer issues because trying to troubleshoot a large, heterogeneous transfer after the transfer runs for a day or more can be complex.
When you've finished planning your petabyte-scale data transfer, we recommend that you transfer a few segments onto the Snow Family device from your server to calibrate your speed and total transfer time.