Creating a task for transferring your data
A task describes where and how AWS DataSync transfers data. A task consists of the following:
-
Source location – The storage system or service where DataSync transfers data from.
-
Destination location – The storage system or service where DataSync transfers data to.
-
Task options – Settings such as what files to transfer, how data gets verified, when the task runs, and more.
-
Task executions – When you run a task, it's called a task execution.
Creating your task
When you create a DataSync task, you specify your source and destination locations. You also can customize your task by choosing which files to transfer, how metadata gets handled, setting up a schedule, and more.
Before you create your task, make sure that you understand how DataSync transfers work and review the task quotas.
Important
If you're planning to transfer data to or from an Amazon S3 location, review how DataSync can affect your S3 request charges and the DataSync pricing page
Open the AWS DataSync console at https://console.aws.amazon.com/datasync/
. -
Make sure you're in one of the AWS Regions where you plan to transfer data.
In the left navigation pane, expand Data transfer, then choose Tasks, and then choose Create task.
-
On the Configure source location page, create or choose a source location, then choose Next.
-
On the Configure destination location page, create or choose a destination location, then choose Next.
-
(Recommended) On the Configure settings page, give your task a name that you can remember.
-
While still on the Configure settings page, choose your task options or use the default settings.
You might be interested in some of the following options:
-
Specify the task mode that you want to use.
-
Specify what data to transfer by using a manifest or filters.
-
Configure how to handle file metadata and verify data integrity.
-
Monitor your transfer with task reports or Amazon CloudWatch. We recommend setting up some kind of monitoring for your task.
When you're done, choose Next.
-
-
Review your task configuration, then choose Create task.
You're ready to start your task.
Once you create your DataSync source and destination locations, you can create your task.
-
In your AWS CLI settings, make sure that you're using one of the AWS Regions where you plan to transfer data.
-
Copy the following
create-task
command:aws datasync create-task \ --source-location-arn "arn:aws:datasync:
us-east-1
:account-id
:location/location-id
" \ --destination-location-arn "arn:aws:datasync:us-east-1
:account-id
:location/location-id
" \ --name "task-name
" -
For
--source-location-arn
, specify the Amazon Resource Name (ARN) of your source location. -
For
--destination-location-arn
, specify the ARN of your destination location.If you're transferring across AWS Regions or accounts, make sure that the ARN includes the other Region or account ID.
-
(Recommended) For
--name
, specify a name for your task that you can remember. -
Specify other task options as needed. You might be interested in some of the following options:
-
Specify what data to transfer by using a manifest or filters.
-
Configure how to handle file metadata and verify data integrity.
-
Monitor your transfer with task reports or Amazon CloudWatch. We recommend setting up some kind of monitoring for your task.
For more options, see create-task
. Here's an example create-task
command that specifies several options:aws datasync create-task \ --source-location-arn "arn:aws:datasync:
us-east-1
:account-id
:location/location-id
" \ --destination-location-arn "arn:aws:datasync:us-east-1
:account-id
:location/location-id
" \ --cloud-watch-log-group-arn "arn:aws:logs:region
:account-id
" \ --name "task-name
" \ --options VerifyMode=NONE,OverwriteMode=NEVER,Atime=BEST_EFFORT,Mtime=PRESERVE,Uid=INT_VALUE,Gid=INT_VALUE,PreserveDevices=PRESERVE,PosixPermissions=PRESERVE,PreserveDeletedFiles=PRESERVE,TaskQueueing=ENABLED,LogLevel=TRANSFER -
-
Run the
create-task
command.If the command is successful, you get a response that shows you the ARN of the task that you created. For example:
{ "TaskArn": "arn:aws:datasync:us-east-1:111222333444:task/task-08de6e6697796f026" }
You're ready to start your task.
Task statuses
When you create a DataSync task, you can check its status to see if it's ready to run.
Console status | API status | Description |
---|---|---|
Available |
|
The task is ready to start transferring data. |
Running |
|
A task execution is in progress. For more information, see Task execution statuses. |
Unavailable |
|
A DataSync agent used by the task is offline. For more information, see What do I do if my agent is offline? |
Queued |
|
Another task execution that uses the same DataSync agent is in progress. For more information, see Knowing when your task is queued. |
Creating multiple tasks for transferring large datasets
If you're transferring a large dataset, which might include millions of files or objects, we recommend creating multiple tasks that you can run in parallel. Spreading the workload across multiple tasks (and possibly agents, depending on your locations) helps reduce the time it takes DataSync to prepare and transfer your data.
Consider the following ways that you can spread out a large transfer across several DataSync tasks:
Be mindful that this approach can increase the I/O operations on your storage and
affect your network bandwidth. For more information, see the blog on How to accelerate your data transfers with DataSync scale out
architectures
Creating multiple tasks for segmenting transferred data
If you're transferring different sets of data to the same destination, you can create multiple tasks to help segment the data that you transfer.
For example, if you're transferring to the same S3 bucket named
MyBucket
, you can create different prefixes in the bucket that
correspond to each task. This approach prevents file name conflicts the datasets and
allows you to set different permissions for each prefix. Here's how you might set
this up:
-
Create three prefixes in the destination
MyBucket
namedtask1
,task2
, andtask3
:-
s3://MyBucket/task1
-
s3://MyBucket/task2
-
s3://MyBucket/task3
-
-
Create three DataSync tasks named
task1
,task2
, andtask3
that transfer to the corresponding prefix inMyBucket
.