AWS DataSync
User Guide

The AWS Documentation website is getting a new look!
Try it now and let us know what you think. Switch to the new look >>

You can return to the original look by selecting English in the language selector above.

Creating a Task

If this is your first time using DataSync, the instructions in Getting Started with AWS DataSync walk you through the process of creating a task.

You can create tasks that transfer from different source and destination location types. The following table shows the different combinations.

Source (From) Destination (To)

On-premises NFS file system

Amazon EFS file systemunderstand-task-creation-statuses

On-premises NFS file system

Amazon S3

Amazon EFS

On-premises NFS file system

Amazon S3

On-premises NFS file system

In-cloud NFS file system or Amazon EFS

Amazon S3understand-task-creation-statuses

Amazon S3

In-cloud NFS file system or Amazon EFS

When you initially create a task, it enters the CREATING status. During the CREATING status, AWS DataSync attempts to mount the NFS location. The task transitions to the AVAILABLE status without waiting for the AWS location to become available. If necessary, AWS DataSync mounts the AWS location before every task execution and then unmounts it after every task execution. If an agent that is associated with an NFS location goes offline, the task transitions to the UNAVAILABLE status.

If the status of the task remains in the CREATING status for more than a few minutes, it means that your agent might be having trouble mounting the source NFS file system. You can check the task’s ErrorCode and ErrorDetail values in the API Reference section in this guide. Mount issues are often caused by either a misconfigured firewall or a mistyped NFS server hostname. For troubleshooting information, see Troubleshooting AWS DataSync Issues.

Creating a Task to Transfer Data Between On-Premises NFS and AWS

If you have previously created a task and want to create additional tasks, use the following procedure.

To create a task

  1. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

  2. On the navigation pane, choose Tasks, and then choose Create task.

  3. On the Configure source location page, choose Create new location and configure a new location if you want to use a new location for your source. Provide the configuration settingd and choose Next. For instructions on how to create a location, see Working with Locations.

    If you want to use a source location that you previously created, choose Choose existing location, choose your source location from the list, and then choose Next.

    For step-by-step instruction, see Configure a Source Location.

Creating a Task to Transfer Between In-Cloud Locations

Use the following instructions to set up the DataSync Amazon EC2 agent for data transfers. The examples in this section cover these use cases:

Creating a Task to Transfer from In-Cloud NFS to In-Cloud NFS or S3

Use the following instructions to transfer data from an in-cloud NFS file system to AWS. To perform this transfer, the DataSync agent must be located in the same AWS Region and same AWS account where the file system is deployed. This type of transfer includes transfers from EFS to EFS, transfers from self-managed NFS to Amazon EFS, and transfers to S3. For information about how in-cloud NFS to in-cloud NFS or Amazon S3 works, see Transfer Data from In-Cloud NFS to In-Cloud NFS or S3.

Note

Deploy the agent in the AWS Region and AWS account where the source EFS or self-managed NFS file system resides.

Deploying Your DataSync Agent as an EC2 Instance to Read Files from In-Cloud

To deploy the DataSync agent as an EC2 instance

  1. From the AWS account where the source EFS resides, launch the agent using your Amazon Machine Image (AMI) from the Amazon EC2 launch wizard. Use the following URL to launch the AMI.

    https://console.aws.amazon.com/ec2/v2/home?region=source-efs-or-nfs-region#LaunchInstanceWizard:ami=ami-id.

    In the URL, replace the source-efs-or-nfs-region and ami-id with your own.

    After the AMI launches, the Choose an Instance Type appears on the Amazon EC2 console. For a list of AMI IDs by AWS Region, see Deploy Your Agent as an EC2 Instance to Read Files from In-Cloud.

  2. Choose one of the recommended instance types for your use case, and choose Next: Configure Instance Details. For the recommended instance types, see Amazon EC2 Instance Requirements .

  3. On the Configure Instance Details page, do the following:

    1. For Network, choose the VPC where your source EFS or NFS is located.

    2. Choose a value for Auto-assign Public IP. If you want your instance to be accessible from the public internet, set Auto-assign Public IP to Enable. Otherwise, set Auto-assign Public IP to Disable. If a public IP address isn't assigned, activate the agent in your VPC using its private IP address.

      When you transfer files from an in-cloud NFS, to increase performance, we recommend that you choose the Placement Group where your NFS server resides.

  4. Choose Next: Add Storage. The agent doesn't require additional storage, so you can skip this step and choose Next: Add tags.

  5. (Optional) On the Add Tags page, you can add tags to your EC2 instance. When you're finished on the page, choose Next: Configure Security Group.

  6. On the Configure Security Group page, do the following:

    1. Make sure that the selected security group allows inbound access to HTTP port 80 from the web browser that you plan to use to activate the agent.

    2. Make sure that the security group of source EFS or NFS allows inbound traffic from the agent. In addition, make sure that the agent allows outbound traffic to the source EFS or NFS. The traffic goes through the standard NFS port, 2049.

    For the complete set of network requirements for DataSync, see Network Requirements.

  7. Choose Review and Launch to review your configuration, then choose Launch to launch your instance. Remember to use a key pair that's accessible to you. A confirmation page appears and indicates that your instance is launching.

  8. Choose View Instances to close the confirmation page and return to the EC2 instances screen. When you launch an instance, its initial state is pending. After the instance starts, its state changes to running. At this point, it is assigned a public Domain Name System (DNS) name and IP address, which can be found in the Descriptions tab.

  9. If you set Auto-assign Public IP to Enable, choose your instance and note the public IP address in the Description tab. You use this IP address later to connect to your sync agent.

    If you set Auto-assign Public IP to Disable, launch or use an existing instance in your VPC to activate the agent. In this case, you use the private IP address of the sync agent to activate the agent from this instance in the VPC.

Creating a Task to Transfer Data from EFS or Self-Managed NFS

Next, you create a task to transfer data.

Note

Create the task in the AWS Region and AWS account where the destination EFS or S3 bucket resides.

To create a task

  1. Open the DataSync console in the AWS Region where your destination Amazon EFS file system is located. The destination EFS or S3 bucket must be in the same AWS account.

  2. Choose Create task, then choose On-premises to AWS on the Use case options page, and then choose Create agent.

  3. In the Create agent wizard's Activation section, enter the EC2 instance's IP address for Agent address, and then choose Get key. This IP address can be private or public. For more details, see step 9 of To deploy the DataSync agent as an EC2 instance.

    Your browser connects to this IP address to get a unique activation key from your agent. This key securely associates your agent with your AWS account. This IP address doesn't need to be accessible from outside your network, but must be accessible from your browser.

  4. understand-task-creation-statusesEnter an agent name that you can easily identify later, and choose Create agent when done. You can optionally add tags to the agent.

  5. Choose Tasks from the navigation pane.

  6. Choose On-premises to AWS, and choose Next to open the Source configuration page.

  7. In the Source location options, choose Create new location and choose Network File System (NFS). Fill in the following options:

    • For agent, choose your newly created agent from the list.

    • If you are copying from EFS, do the following:

      • For NFS Server, enter the DNS name of your source EFS.

      • For Mount path, enter / (backslash) and choose Next.

    • If you are copying from self-managed NFS, do the following:

      • For NFS Server, enter the private DNS or IP address of your source NFS.

      • For Mount path, enter a path that is exported by your NFS server and choose Next. For more information, see Create an NFS Location.

  8. Choose Create new location. This is the destination location for your data transfer. Fill in the following options:

    • If you are copying to EFS, do the following:

      • For Location type, choose EFS.

      • Choose your destination EFS.

      • For Mount path, enter / (backslash).

      • For Subnet and Security groups, use the default settings and choose Next.

    • If you are copying to S3, do the following:

      • For Location type, choose Amazon S3 bucket.

      • For S3 bucket, choose your source S3 bucket.

      • For Folder, choose a folder prefix to use for the transfer, or you can keep it blank.

      • Choose your destination S3 bucket and an optional folder. DataSync can autogenerate an AWS Identity and Access Management (IAM) role to access your bucket, or you can create on your own.

  9. Choose Next, and optionally name the task and add tags.

  10. Choose or create an Amazon CloudWatch Logs log group at the bottom of the page, and choose Next. For more information on working with CloudWatch Logs, see Allowing DataSync to Upload Logs to Amazon CloudWatch Log Groups.

  11. Review the settings on the next page, and choose Create task.

  12. Choose Start to run the task that you just created to start transferring data.

Creating a Task to Transfer from S3 to In-Cloud NFS

Use the following instructions to transfer data from S3 to an in-cloud NFS file system that is located in the same AWS account and AWS Region where the agent is deployed. This approach includes transfers from S3 to EFS, or from S3 to self-managed NFS. The following diagram illustrates this type of transfer. For information about how S3 to in-cloud NFS works, see Transfer from S3 to In-Cloud NFS.

Deploying the DataSync EC2 Agent to Write to your Destination Location

First, deploy the DataSync EC2 agent in the AWS Region and AWS account where the destination EFS or self-managed NFS resides.

To deploy the agent

  • Launch the agent from the selected AMI by using the EC2 launch wizard. To do so, use the following URL.

    https://console.aws.amazon.com/ec2/v2/home?region=DESTINATION-EFS-or-NFS-REGION#LaunchInstanceWizard:ami=AMI-ID.

    In the URL, replace the AWS Region and AMI ID with your own. You are redirected to the Choose an Instance Type page on the EC2 console. For a list of AMI IDs by AWS Region, see Deploy Your Agent as an EC2 Instance to Read Files from In-Cloud.

Creating a Task to Transfer Data from Amazon S3

Next, you create a task to transfer data.

Note

Create the task in the AWS account and AWS Region where the source S3 bucket resides.

To create a task that transfers data from S3 to EFS or a self-managed NFS

  1. Open the DataSync console in the AWS Region where your source S3 bucket is located.

  2. Choose Create task, and choose the use case AWS to on-premises.

  3. Choose Create agent.

  4. If you set Auto-assign Public IP to Enable, choose your instance and note the public IP address in the Description tab. You use this IP address later to connect to your sync agent.

    If you set Auto-assign Public IP to Disable, launch or use an existing instance in your VPC to activate the agent. In this case, you use the private IP address of the sync agent to activate the agent from this instance in the VPC.

  5. In the Create Agent wizard, for Agent address enter the EC2 instance's IP address (private or public, as explained in step 3), and then choose Get key.

    Your browser connects to this IP address to get a unique activation key from your agent. This key securely associates your agent with your AWS account. This IP address doesn't need to be accessible from outside your network, but must be accessible from your browser.

  6. Choose an agent name that you can easily identify later. You can optionally add tags. When you're done, choose Create agent.

  7. Choose AWS to on-premises, and choose Next.

  8. Choose Create new location:

    • For Location type, choose Amazon S3 bucket.

    • For S3 bucket, choose your source S3 bucket.

    • For Folder, choose a folder prefix for the transfer, or you can keep it blank.

      DataSync can autogenerate an IAM role to access your bucket, or you can create on your own.

  9. Choose Next. Choose Create new location, choose NFS for Location type, and choose the agent that you just created from the list.

    1. If you are copying to EFS, do the following:

      • For NFS Server, enter the DNS name of your source EFS.

      • For Mount path, enter / (backslash) and choose Next.

    2. If you are copying to in-cloud NFS, do the following:

      • For NFS Server, enter the private DNS or IP address of your source NFS.

      • For Mount path, enter a path that is exported by your NFS server. For more information, see Create an NFS Location.

  10. Choose Next, and optionally name the task and add tags.

  11. Choose or create a CloudWatch Logs log group at the bottom of the page, and choose Next. For more information on working with CloudWatch Logs, see Allowing DataSync to Upload Logs to Amazon CloudWatch Log Groups.

  12. Review the settings on the next page, and choose Create task.

  13. Choose Start to run the task that you just created to transfer data, and then choose Start again on the Start Task page.

Configuring Task Settings

Following, you can find information on how to configure a task setting. You use these settings to control how a task execution behaves. These settings are available in the Options section, shown following.

These options control the behavior of a task execution. Behavior includes preserving metadata such as the user ID (UID) or group ID (GID), preserving file permissions, and data integrity verification. If you don't specify values for these options, DataSync uses a set of default options that can be overridden for a task execution.

Available options are as follows:

  • Verify data – Verify the data in your destination after transfer. DataSync always performs data integrity checks while transferring and writing data.

    Choose Check integrity during transfer to check data integrity during the transfer.

    Choose Verify only the data transferred to verify only the data that is transferred.

    Choose Verify all data in the destination to check integrity during the transfer.

  • Copy ownership – If you choose this option, DataSync copies file ownership such as the group ID of the file's owners and the user ID of the file's owner.

  • Copy permissions – If you choose this option, DataSync copies file POSIX permissions from the source to the destination.

  • Copy timestamps – If you choose this option, DataSync copies the timestamp metadata from the source to the destination.

  • Keep deleted files – If you choose this option, DataSync keeps files in the destination that don't exist in the source file system.

    If your task deletes objects, you might incur minimum storage duration charges for certain storage classes. For detailed information, see Considerations When Working with S3 Storage Classes in DataSync.

  • Overwrite files – If you choose this option, files at the destination are overwritten by files from the source. If you don't choose this option, the destination file isn't replaced by the source file, even if the destination file differs from the source file.

    If your task overwrites objects, you might incur minimum storage duration charges for certain storage classes. For detailed information, see Considerations When Working with S3 Storage Classes in DataSync.

  • Use available or Set bandwidth limit (MiB/s) – If you choose Use available, DataSync uses all the network bandwidth that is available for the transfer. If you choose Set bandwidth limit (MiB/s), you limit the maximum bandwidth that you want DataSync to use for this task.

  • Queueing – If you use a single agent to run multiple tasks, choose this option to make the tasks run in series (that is, first in, first out). For more information, see Queueing Task Executions.

  • In the Filtering configuration - Optional section, enter a pattern to use as a filter. This pattern defines the criteria for specific files, folders, and objects to exclude from your transfer. You can add more patterns later by editing the task configuration. For more information, see Excluding Data from a Transfer. You can include files, folders, and objects in the transfer when you start a task. For more information, see Start Your Task.

    Note

    To use a pipe in your pattern, you must escape it. For examples, see Filtering the Data Transferred by AWS DataSync.

  • In the Tags - optional section, enter Key and Value to tag your task. A tag is a key-value pair that helps you manage, filter, and search for your tasks. We recommend that you create a name tag for your task.

  • Task logging - optional – If you choose this option, DataSync uses Amazon CloudWatch Log Groups to log activities and errors that occur during the execution of your task. For DataSync to upload logs to your CloudWatch Log Group, DataSync requires a resource policy that grants sufficient permissions. For an example of such a policy, see Allowing DataSync to Upload Logs to Amazon CloudWatch Log Groups.

    For more information, see Working with Log Groups and Log Streams in the Amazon CloudWatch User Guide.