Migration guide - Research Service Workbench on AWS

Migration guide

You can migrate studies from Service Workbench on AWS to datasets in Research Service Workbench on AWS (RSW). To migrate your studies from SWB to datasets in RSW, you will need to create projects and datasets in RSW. With Amazon S3’s copy APIs, you can transfer data from the study S3 buckets to the dataset S3 bucket.

Cost of migration

If you are copying S3 data to the same AWS Region, there are no fees. When copying to a different region, fees are incurred. For more information, see Amazon S3 pricing.

Migration prerequisites

After deploying Research Service Workbench on AWS, IT Admins need to create each project the datasets will belong to. You can use the Create Project API in RSW.

Once all projects are created, the IT Admin will need to create a Project Admin user with an email address different from the one used for the IT Admin role. If your email provider supports plus addressing, we recommend using your existing address with +pa. For example, myemail+pa@amazon.com. This allows registration emails to come to your standard email address.

After creating the Project Admin user, IT Admins should add that user to each project that owns a dataset. Use the Add User to Project API, specifying ProjectAdmin as the role in the request body.

When the Project Admin user has been added to all projects, the IT Admin should login as the Project Admin user and create datasets under each project using the Create Internal Dataset API. This creates a folder in the dataset S3 bucket, allowing IT Admins to copy data from studies into the datasets.

Migration process

IT Admins need an IAM user and role with access to both the study S3 bucket(s) and the dataset S3 bucket. The RSW dataset S3 bucket will be named rsw-<stage>-<region-abbreviation>-s3datasets<uuid>.

You need the following IAM permissions:

On the study S3 bucket:

  • S3:ListBucket

  • S3:GetObject

On the datasets S3 bucket:

  • S3:ListBucket

  • S3:PutObject

Migration can be performed through AWS Management Console or AWS CLI.

Migrating with AWS Management Console

  1. Sign in to the AWS Management Console with the account that has permissions ot the study S3 bucket(s) and the datasets S3 bucket.

  2. Select the study S3 bucket and choose the files from the study you want to migrate into the dataset.

  3. Choose Actions.

  4. Choose Copy.

    Figure 1: Copy the S3 files

    Figure 1: Copy the S3 files

  5. Provide the S3 bucket URL for the dataset S3 bucket and folder.

  6. Choose Copy.

    Figure 2: Provide the S3 bucket URL

    Figure 2: Provide the S3 bucket URL

  7. Verify you receive a success message: Successfully copied objects. You may also verify success by checking the datasets S3 bucket.

Migrating with AWS CLI

Run the following command for each folder that needs to migrate from a SWB study to a RSW dataset:

aws s3 cp s3://myawsaccount-stage-regionabbreviation-sw-studydata/study-123 s3://swb-stage-regionabbreviation-s3datasetsuuid/study-123 —recursive

You will receive a copy confirmation for each file in the folder that migrates to the S3 datasets bucket.

Migration cleanup

After all studies migrate to RSW datasets, the IT Admin should remove the Project Admin user created for migration from all projects and delete the user. Use the Remove User from Project API. After removing the user, use the Delete User API.