Menu
AWS Snowball
User Guide

This guide is for the Snowball (50TB or 80TB of storage space). If you are looking for documentation for the Snowball Edge, see the AWS Snowball Edge Developer Guide.

Best Practices for AWS Snowball

This checklist is intended to help you get the maximum benefit from and satisfaction with AWS Snowball (Snowball).

Security

  • If you notice anything that looks suspicious about the Snowball, don't connect it to your internal network. Instead, contact AWS Support, and a new Snowball will be shipped to you.

  • We recommend that you don't save a copy of the unlock code in the same location in the workstation as the manifest for that job. Saving these separately helps prevent unauthorized parties from gaining access to the Snowball. For example, you can save a copy of the manifest to the workstation, and email the code to the AWS Identity and Access Management (IAM) user to perform the data transfer from the workstation. This approach limits access to the Snowball to individuals who have access to files saved on the workstation and also that IAM user's email address.

  • Whenever you transfer data between your on-premises data centers and a Snowball, plaintext logs are automatically generated and saved to your workstation. These logs are saved in plaintext format and can contain file name and path information for the files that you transfer. To protect this potentially sensitive information, we strongly suggest that you delete these logs once the job that the logs are associated with enters the Completed status. For more information about logs, see Snowball Logs.

Network

  • Your workstation should be the local host for your data. For performance reasons, we don't recommend reading files across a network when using Snowball to transfer data. If you must transfer data across a network, batch the local cache before copying to the Snowball so the copy operation is able to proceed as fast as possible.

  • Because the workstation is considered to be the bottleneck for transferring data, we highly recommend that your workstation be a powerful computer, able to meet high demands in terms of processing, memory, and networking. For more information, see Workstation Specifications.

  • You can run simultaneous instances of the Snowball client in multiple terminals, each using the copy operation to speed up your data transfer. For more information about using the Snowball client see Commands for the Snowball Client.

  • To prevent corrupting your data, do not disconnect the Snowball or change its network settings while transferring data.

  • Files must be in a static state while being copied. Files that are modified while they are being transferred will not be imported into Amazon S3.

Resource Management

  • The 10 free days for performing your on-premises data transfer start the day after the Snowball arrives at your data center, and stop when you ship the appliance back out.

  • The Job created status is the only status in which you can cancel a job. When a job changes to a different status, it can’t be canceled.

  • For import jobs, don't delete your local copies of the transferred data until the import into Amazon Simple Storage Service (Amazon S3) is successful at the end of the process and you can verify the results of the data transfer.

Performance for AWS Snowball

Following, you can find information about AWS Snowball performance. Here, we discuss performance in general terms, because on-premises environments each have a different way of doing things—different network technologies, different hardware, different operating systems, different procedures, and so on. To provide meaningful guidance about data transfer performance, following we discuss how to determine when to use Snowball instead of data transfer over the Internet, and how to speed up transfer from your data source to the Snowball.

Performance Recommendations

The following recommendations are highly suggested, as they will have the largest impact in improving the performance of your data transfer.

  • We recommend that you use a powerful computer as your workstation. Because the computer workstation from which or to which you make the data transfer is considered to be the bottleneck for transferring data, it should be able to meet high demands in terms of processing, memory, and networking. For more information, see Workstation Specifications.

  • We recommend that you have no more than 500,000 files or directories within each directory.

  • We recommend that all files transferred to a Snowball be no smaller than 1 MB in size. If you have many files smaller than 1 MB in size each, we recommend that you zip them up into larger archives before transferring them onto a Snowball.

Speeding Up Data Transfer

In general, you can improve the transfer speed from your data source to the Snowball in the following ways, ordered from largest to smallest positive impact on performance:

  1. Perform multiple copy operations at one time – If your workstation is powerful enough, you can perform multiple snowball cp commands at one time. You can do this by running each command from a separate terminal window, in separate instances of the Snowball client, all connected to the same Snowball.

  2. Copy from multiple workstations – A single Snowball can be connected to multiple workstations. Each workstation can host a separate instance of the Snowball client.

  3. Transfer large files or batch small files together – Each copy operation has some overhead because of encryption. Therefore, performing many snowball cp commands on individual files has slower overall performance than transferring the same number of files in a single command. To speed the process up, batch files together in a single snowball cp command. You can do this by copying entire directories of files, or by bundling the files together into larger archives. Because there is overhead for each snowball cp command, we don't recommend that you queue a large number of individual copy commands. Queuing many commands has a significant negative impact on your transfer performance.

    For example, say you have a directory called C:\\MyFiles that only contains three files, file1.txt, file2.txt, and file3.txt. Suppose that you issue the following three commands.

    Copy
    snowball cp C:\\MyFiles\file1.txt s3://mybucket snowball cp C:\\MyFiles\file2.txt s3://mybucket snowball cp C:\\MyFiles\file3.txt s3://mybucket

    In this scenario, you have three times as much overhead as if you transferred the entire directory with the following copy command.

    Copy
    Snowball cp –r C:\\MyFiles\* s3://mybucket
  4. Don't perform other operations on files during transfer – Renaming files during transfer, changing their metadata, or writing data to the files during a copy operation has a significant negative impact on transfer performance. We recommend that your files remain in a static state while you transfer them.

  5. Reducing local network use – Because the Snowball communicates across your local network, reducing or otherwise eliminating other local network traffic between the Snowball, the switch it's connected to, and the workstation that hosts your data source can result in a significant improvement of data transfer speeds.

  6. Eliminating unnecessary hops – If you set up your Snowball, your data source, and your workstation so that they're the only machines communicating across a single switch, it can result in a significant improvement of data transfer speeds.

Experimenting to Get Better Performance

Because your performance results will vary based on your hardware, your network, how many and how large your files are, and how they're stored, we suggest that you experiment with your performance metrics if you're not getting the performance that you'd like to see.

First, attempt multiple copy operations until you see a reduction in overall transfer performance. Performing multiple copy operations at once can have a significantly positive impact on your overall transfer performance. For example, say you have a single snowball cp command running in a terminal window, and you note that it's transferring data at 30 MB/second. Say you open a second terminal window, and run a second snowball cp command on another set of files that you want to transfer. Let's assume that you note that both commands are performing at 30 MB/second. In this case, your total transfer performance is 60 MB/second.

Now, connect to the Snowball from a separate workstation, and run the Snowball client from that workstation to execute a third snowball cp command on another set of files that you want to transfer. Now when you check the performance, you note that all three instances of the snowball cp command are operating at a performance of 25 MB/second, with a total performance of 75 MB/second. Even though the individual performance of each instance has decreased in this example, the overall performance has increased.

Experimenting in this way, using the techniques listed in Speeding Up Data Transfer, will help you optimize your data transfer performance.

Why AWS Snowball Has Such High Hardware Specifications for Workstations

As outlined in Workstation Specifications, Snowball has stringent hardware specifications for the workstations that are used to transfer data to and from a Snowball. These hardware specifications are mainly based on security requirements for the service. When data is transferred to a Snowball, a file is loaded into the workstation's memory. While in memory, that file is fully encrypted by either Snowball client or the Amazon S3 Adapter for Snowball. Once the file has been encrypted, chunks of the encrypted file are sent to the Snowball. At no point is any data stored to disk. All data is kept in memory, and only encrypted data is sent to the Snowball. This loading into memory, encrypting, chunking, and sending to the Snowball is both CPU- and memory-intensive.

Performance Considerations for HDFS Data Transfers

When getting ready to transfer data from a Hadoop Distributed File System (HDFS) cluster (version 2.x) into a Snowball, we recommend that you follow the guidance in the previous section, and also the following tips:

  • Don't copy the entire cluster over in a single command – Transferring an entire cluster in a single command can cause performance issues, including slow transfers, "flipped" bits, and missing or corrupted data on the Snowball. We recommend that in this case you separate the data transfer into multiple parts.

  • Don't transfer a large number of small files – If you have a large number of files, say over a thousand, and those files are small, say under a MB each in size, then transferring them all at once will have a negative impact on your performance. This performance degradation is due to per-file overhead associated with transferring data from HDFS clusters. If you must transfer a large number of small files, we recommend that you find a method of collecting them into larger archive files, and then transferring those. However, these archives will be what is imported into Amazon S3. Thus, if you want the files in their original state, you'll need to take them out of the archives after the archives are in the cloud.

On this page: