Tutorial: Transferring data from on-premises storage to Amazon S3 across AWS accounts - AWS DataSync

Tutorial: Transferring data from on-premises storage to Amazon S3 across AWS accounts

When using AWS DataSync with on-premises storage, you typically copy data to an AWS storage service that belongs to the same AWS account as your DataSync agent. There are situations, however, where you might need to transfer data to an Amazon S3 bucket that's associated with a different account.

Important

Copying data across AWS accounts by using the methods in this tutorial works only when Amazon S3 is one of the DataSync transfer locations.

Overview

It's not uncommon to need to transfer data between different AWS accounts, especially if you have separate teams managing your organization's resources. Here's what a cross-account transfer using DataSync can look like:

  • Source account: The AWS account for managing network resources. This is the account that you'll activate your DataSync agent with.

  • Destination account: The AWS account for managing the S3 bucket that you need to transfer data to.

The following diagram illustrates this kind of scenario.


                An example DataSync scenario of data moving from an on-premises storage system
                    through an AWS Direct Connect connection across the
                    internet
                    into AWS. The data is first transferred into one AWS account (your source
                    account), before finally making it into an Amazon S3 bucket in a different
                    AWS account (your destination account).

Required permissions

Before you begin, make sure that your source and destination AWS accounts have the right permissions to complete a cross-account transfer to an S3 bucket.

Required permissions for your source account

For your source AWS account, there are two sets of permissions to consider for this kind of cross-account transfer. One set of permissions is for the user who works with DataSync to create and start the transfer task (for example, your storage administrator). The other set of permissions allows the DataSync service to transfer objects to the S3 bucket in your destination account on your behalf.

User permissions

You need the following permissions in your source account to use DataSync while going through this tutorial:

  • datasync:CancelTaskExecution

  • datasync:CreateLocation*

  • datasync:CreateTask

  • datasync:DescribeLocation*

  • datasync:DescribeTask

  • datasync:DescribeTaskExecution

  • datasync:ListLocations

  • datasync:ListTasks

  • datasync:ListTaskExecutions

  • datasync:StartTaskExecution

  • iam:AttachRolePolicy

  • iam:CreateRole

  • iam:CreatePolicy

  • iam:ListRoles

  • iam:PassRole

  • s3:GetBucketLocation

  • s3:ListAllMyBuckets

  • s3:ListBucket

Tip

For user permissions, consider using AWSDataSyncFullAccess, an AWS managed policy that provides full access to DataSync and minimal access to its dependencies. This managed policy also provides transfer task logging by default.

DataSync permissions

DataSync needs permission to write data to the S3 bucket in your destination account on your behalf. In your source account, you'll create an AWS Identity and Access Management (IAM) role that can do this. You'll then specify this role when creating your DataSync destination location.

Required permissions for your destination account

For your destination AWS account, you need permission to disable your S3 bucket's access control lists (ACLs) and update the bucket's policy. For more information on these specific permissions, see the Amazon S3 User Guide.

Step 1: In your source account, create a DataSync agent

To get started, you must create a DataSync agent that can read from your on-premises storage system and communicate with AWS. This process includes deploying an agent in your on-premises storage environment and activating the agent in your source AWS account.

Note

The steps in this tutorial apply to any type of agent and service endpoint that you use.

To create a DataSync agent
  1. Deploy a DataSync agent in your on-premises storage environment.

  2. Choose a service endpoint that the agent will use to communicate with AWS.

  3. Activate your agent in your source account.

Step 2: In your source account, create a DataSync source location for your on-premises storage

In your source account, create a DataSync source location for the on-premises storage system that you're transferring data from. This location should use the agent that you just activated in your source account.

Step 3: In your source account, create an IAM role for DataSync

In your source account, you need an IAM role that gives DataSync permission to write to the S3 bucket in your destination account on your behalf.

Normally, when you create a transfer location for an S3 bucket in the DataSync console, DataSync can automatically create and assume a role that has the right permissions to write to that bucket. Since you're transferring across accounts, however, you must create the role manually.

Create the IAM role

Create an IAM role with DataSync as the trusted entity.

To create the IAM role
  1. Log in to the AWS Management Console with your source account.

  2. Open the IAM console at https://console.aws.amazon.com/iam/.

  3. In the left navigation pane, under Access management, choose Roles, and then choose Create role.

  4. On the Select trusted entity page, for Trusted entity type, choose AWS service.

  5. For Use case, choose DataSync in the dropdown list and select DataSync. Choose Next.

  6. On the Add permissions page, choose Next.

  7. Give your role a name and choose Create role.

For more information, see Creating a role for an AWS service (console) in the IAM User Guide.

Attach a custom policy to the IAM role

The IAM role that you just created needs a policy that allows DataSync to write to the S3 bucket in your destination account.

To attach a custom policy to the IAM role
  1. On the Roles page of the IAM console, search for the role that you just created and choose its name.

  2. On the role's details page, choose the Permissions tab. Choose Add permissions then Create inline policy.

  3. Choose the JSON tab and do the following:

    1. Paste the following JSON into the policy editor:

      { "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetBucketLocation", "s3:ListBucket", "s3:ListBucketMultipartUploads" ], "Effect": "Allow", "Resource": "arn:aws:s3:::destination-bucket" }, { "Action": [ "s3:AbortMultipartUpload", "s3:DeleteObject", "s3:GetObject", "s3:ListMultipartUploadParts", "s3:PutObject", "s3:GetObjectTagging", "s3:PutObjectTagging" ], "Effect": "Allow", "Resource": "arn:aws:s3:::destination-bucket/*" } ] }
    2. Replace each instance of destination-bucket with the name of the S3 bucket in your destination account.

  4. Choose Next. Give your policy a name and choose Create policy.

Step 4: In your destination account, disable ACLs for your S3 bucket

It's important that all the data that you copy to the S3 bucket belongs to your destination account. To ensure that this account owns the data, disable the bucket's access control lists (ACLs).

To disable ACLs for an S3 bucket
  1. In the AWS Management Console, switch over to your destination account.

  2. Open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  3. In the left navigation pane, choose Buckets.

  4. In the Buckets list, choose the S3 bucket that you're transferring data to.

  5. On the bucket's detail page, choose the Permissions tab.

  6. Under Object Ownership, choose Edit.

  7. If it isn't already selected, choose the ACLs disabled (recommended) option.

  8. Choose Save changes.

For more information, see Controlling ownership of objects and disabling ACLs for your bucket in the Amazon S3 User Guide.

Step 5: In your destination account, update your S3 bucket policy

In your destination account, modify the destination S3 bucket policy to include the DataSync IAM role that you created in your source account.

The updated bucket policy (provided to you in the following instructions) includes two principals:

To update the destination S3 bucket policy
  1. While still logged in to the S3 console with your destination account, choose the S3 bucket that you're copying data to.

  2. On the bucket's detail page, choose the Permissions tab.

  3. Under Bucket policy, choose Edit and do the following to modify your S3 bucket policy:

    1. Update what's in the editor to include the following policy statements:

      { "Version": "2008-10-17", "Statement": [ { "Sid": "DataSyncCreateS3LocationAndTaskAccess", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::source-account:role/source-datasync-role" }, "Action": [ "s3:GetBucketLocation", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:AbortMultipartUpload", "s3:DeleteObject", "s3:GetObject", "s3:ListMultipartUploadParts", "s3:PutObject", "s3:GetObjectTagging", "s3:PutObjectTagging" ], "Resource": [ "arn:aws:s3:::destination-bucket", "arn:aws:s3:::destination-bucket/*" ] }, { "Sid": "DataSyncCreateS3Location", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::source-account:role/source-user-role" }, "Action": "s3:ListBucket", "Resource": "arn:aws:s3:::destination-bucket" } ] }
    2. Replace each instance of source-account with the AWS account ID for your source account.

    3. Replace source-datasync-role with the IAM role that you created for DataSync in your source account.

    4. Replace each instance of destination-bucket with the name of the S3 bucket in your destination account.

    5. Replace source-user-role with the IAM role that includes the required user permissions to use DataSync.

  4. Choose Save changes.

Step 6: In your source account, create a DataSync destination location for your S3 bucket

In your source account, you need to create a DataSync location for the S3 bucket in your destination account.

The DataSync console won't let you create locations for storage resources in another AWS account. However, you can do this by using AWS CloudShell, a browser-based, pre-authenticated shell that you launch directly from the console. CloudShell allows you to run the AWS CLI commands for completing this tutorial without downloading or installing command line tools.

Note

If you want to complete the following steps by using a command line tool other than CloudShell, make sure your AWS CLI profile uses the same source-user-role that you specified in the destination S3 bucket policy. For more information, see the AWS Command Line Interface User Guide.

To create a DataSync destination location by using CloudShell
  1. In the AWS Management Console, switch back to your source account.

  2. Open the AWS DataSync console at https://console.aws.amazon.com/datasync/.

  3. Do one of the following to launch CloudShell:

    • Choose the CloudShell icon on the console navigation bar. It's located to the right of the search box.

    • Use the search box on the console navigation bar to search for CloudShell and then choose the CloudShell option.

  4. Copy the following command:

    aws datasync create-location-s3 \ --s3-bucket-arn arn:aws:s3:::destination-bucket \ --s3-config '{ "BucketAccessRoleArn":"arn:aws:iam::source-user-account:role/source-datasync-role" }'
  5. Replace destination-bucket with the name of the S3 bucket in your destination account.

  6. Replace source-user-account with the AWS account ID for your source account.

  7. Replace source-datasync-role with the DataSync IAM role that you created in your source account.

  8. Run the command in CloudShell.

    If the command returns a DataSync location ARN similar to this, you successfully created the location:

    { "LocationArn": "arn:aws:datasync:us-east-2:123456789012:location/loc-abcdef01234567890" }
  9. In the left navigation pane, expand Data transfer, then choose Locations.

From your source account, you can see the location of the S3 bucket in the destination account that you just created.

Step 6: In your source account, create and start your DataSync transfer task

Before you move your data, let's recap what you've done so far:

  • In your source account, you deployed and activated your DataSync agent. The agent can read from your on-premises storage system and communicate with AWS.

  • In your source account, you created an IAM role that allows DataSync to write data to the S3 bucket in your destination account.

  • In your destination account, you configured your S3 bucket so that DataSync can access the bucket and write data to it.

  • In your source account, you created the DataSync source and destination locations for your transfer.

To create and start the DataSync transfer task
  1. While still using the DataSync console in your source account, expand Data transfer in the left navigation pane, then choose Tasks and Create task.

  2. On the Configure source location page, choose Choose an existing location. Choose the source location that you're copying data from (your on-premises storage) then Next.

  3. On the Configure destination location page, choose Choose an existing location. Choose the destination location that you're copying data to (the S3 bucket in your destination account) then Next.

  4. On the Configure settings page, give the task a name. As needed, configure additional settings, such as specifying an Amazon CloudWatch log group. Choose Next.

  5. On the Review page, review your settings and choose Create task.

  6. On the task's details page, choose Start, and then choose one of the following:

    • To run the task without modification, choose Start with defaults.

    • To modify the task before running it, choose Start with overriding options.

When your task finishes, check the S3 bucket in your destination account. You should see the data that moved from your source account bucket.

Troubleshooting

Refer to the following information if you run into issues trying to complete your cross-account transfer.

Permissions errors

When setting up a cross-account transfer with Amazon S3, you might see permissions errors. For example, here's a common permissions error when trying to create an S3 destination location:

An error occurred (InvalidRequestException) when calling the CreateLocationS3 operation: DataSync location access test failed: could not perform s3:HeadBucket on bucket DOC-EXAMPLE-DESTINATION-BUCKET. Access denied. Ensure bucket access role has s3:ListBucket permission.

This error means that your source AWS account user permissions are missing the s3:ListBucket permission. These permissions are for the user who creates and starts DataSync tasks. Add s3:ListBucket to your user permissions and try again to create the destination location.

Related resources

For more information about what you did in this tutorial, see the following topics: