Migrate data from Microsoft Azure Blob to Amazon S3 by using Rclone - AWS Prescriptive Guidance

Migrate data from Microsoft Azure Blob to Amazon S3 by using Rclone

Created by Suhas Basavaraj (AWS), Aidan Keane (AWS), and Corey Lane (AWS)

Environment: PoC or pilot

Source: Microsoft Azure storage container

Target: Amazon S3 bucket

R Type: Replatform

Workload: Microsoft

Technologies: Migration; Storage & backup

AWS services: Amazon S3

Summary

This pattern describes how to use Rclone to migrate data from Microsoft Azure Blob object storage to an Amazon Simple Storage Service (Amazon S3) bucket. You can use this pattern to perform a one-time migration or an ongoing synchronization of the data. Rclone is a command-line program written in Go and is used to move data across various storage technologies from cloud providers.

Prerequisites and limitations

Prerequisites

  • An active AWS account

  • Data stored in Azure Blob container service

Architecture

Source technology stack

  • Azure Blob storage container

Target technology stack

  • Amazon S3 bucket

  • Amazon Elastic Compute Cloud (Amazon EC2) Linux instance

Architecture

Migrating data from Microsoft Azure to Amazon S3

Tools

  • Amazon Simple Storage Service (Amazon S3) is a cloud-based object storage service that helps you store, protect, and retrieve any amount of data.

  • Rclone is an open-source command-line program inspired by rsync. It is used to manage files across many cloud storage platforms.

Best practices

When you migrate data from Azure to Amazon S3, be mindful of these considerations to avoid unnecessary costs or slow transfer speeds:

  • Create your AWS infrastructure in the same geographical Region as the Azure storage account and Blob container—for example, AWS Region us-east-1 (N. Virginia) and Azure region East US.

  • Avoid using NAT Gateway if possible, because it accrues data transfer fees for both ingress and egress bandwidth.

  • Use a VPC gateway endpoint for Amazon S3 to increase performance.

  • Consider using an AWS Graviton2 (ARM) processor-based EC2 instance for lower cost and higher performance over Intel x86 instances. Rclone is heavily cross-compiled and provides a precompiled ARM binary.

Epics

TaskDescriptionSkills required

Prepare a destination S3 bucket.

Create a new S3 bucket in the appropriate AWS Region or choose an existing bucket as the destination for the data you want to migrate.

AWS administrator

Create an IAM instance role for Amazon EC2.

Create a new AWS Identity and Access Management (IAM) role for Amazon EC2. This role gives your EC2 instance write access to the destination S3 bucket.

AWS administrator

Attach a policy to the IAM instance role.

Use the IAM console or AWS Command Line Interface (AWS CLI) to create an inline policy for the EC2 instance role that allows write access permissions to the destination S3 bucket. For an example policy, see the Additional information section.

AWS administrator

Launch an EC2 instance.

Launch an Amazon Linux EC2 instance that is configured to use the newly created IAM service role. This instance will also need access to Azure public API endpoints through the internet. 

Note: Consider using AWS Graviton-based EC2 instances to lower costs. Rclone provides ARM-compiled binaries.

AWS administrator

Create an Azure AD service principal.

Use the Azure CLI to create an Azure Active Directory (Azure AD) service principal that has read-only access to the source Azure Blob storage container. For instructions, see the Additional information section. Store these credentials on your EC2 instance to the location ~/azure-principal.json.

Cloud administrator, Azure
TaskDescriptionSkills required

Download and install Rclone.

Download and install the Rclone command-line program. For installation instructions, see the Rclone installation documentation.

General AWS, Cloud administrator

Configure Rclone.

Copy the following rclone.conf sample file. Replace AZStorageAccount with your Azure Storage account name and us-east-1 with the AWS Region where your S3 bucket is located. Save this file to the location ~/.config/rclone/rclone.conf on your EC2 instance.

[AZStorageAccount] type = azureblob account = AZStorageAccount service_principal_file = azure-principal.json [s3] type = s3 provider = AWS env_auth = true region = us-east-1
General AWS, Cloud administrator

Verify Rclone configuration.

To confirm that Rclone is configured and permissions are working properly, verify that Rclone can parse your configuration file and that objects inside your Azure Blob container and S3 bucket are accessible. See the following for example validation commands.

  • List the configured remotes in the configuration file. This will ensure that your configuration file is being parsed correctly. Review the output to make sure that it matches your rclone.conf file.

    rclone listremotes AZStorageAccount: s3:
  • List the Azure Blob containers in the configured account. Replace AZStorageAccount with the storage account name that you used in the rclone.conf file.

    rclone lsd AZStorageAccount: 2020-04-29 08:29:26 docs
  • List the files in the Azure Blob container. Replace docs in this command with an actual Blob container name in your Azure storage account.

    rclone ls AZStorageAccount:docs 824884 administrator-en.a4.pdf
  • List the buckets in your AWS account.

    [root@ip-10-0-20-157 ~]# rclone lsd s3: 2022-03-07 01:44:40 amzn-s3-demo-bucket1 2022-03-07 01:45:16 amzn-s3-demo-bucket2 2022-03-07 02:12:07 amzn-s3-demo-bucket3
  • List the files in the S3 bucket.

    [root@ip-10-0-20-157 ~]# rclone ls s3:amzn-s3-demo-bucket1 template0.yaml template1.yaml
General AWS, Cloud administrator
TaskDescriptionSkills required

Migrate data from your containers.

Run the Rclone copy or sync command.  

Example: copy

This command copies data from the source Azure Blob container to the destination S3 bucket.

rclone copy AZStorageAccount:blob-container s3:amzn-s3-demo-bucket1

Example: sync

This command synchronizes data between the source Azure Blob container and the destination S3 bucket.

rclone sync AZStorageAccount:blob-container s3:amzn-s3-demo-bucket1

Important: When you use the sync command, data that isn't present in the source container will be deleted from the destination S3 bucket.

General AWS, Cloud administrator

Synchronize your containers.

After the initial copy is complete, run the Rclone sync command for ongoing migration so that only new files that are missing from the destination S3 bucket will be copied.

General AWS, Cloud administrator

Verify that data has been migrated successfully.

To check that data was successfully copied to the destination S3 bucket, run the Rclone lsd and ls commands.

General AWS, Cloud administrator

Related resources

Additional information

Example role policy for EC2 instances

This policy gives your EC2 instance read and write access to a specific bucket in your account. If your bucket uses a customer managed key for server-side encryption, the policy might need additional access to AWS Key Management Service (AWS KMS) .

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:DeleteObject", "s3:GetObject", "s3:PutObject", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::amzn-s3-demo-bucket/*", "arn:aws:s3:::amzn-s3-demo-bucket" ] }, { "Effect": "Allow", "Action": "s3:ListAllMyBuckets", "Resource": "arn:aws:s3:::*" } ] }

Creating a read-only Azure AD service principal

An Azure service principal is a security identity that is used by customer applications, services, and automation tools to access specific Azure resources. Think of it as a user identity (login and password or certificate) with a specific role and tightly controlled permissions to access your resources. To create a read-only service principal to follow least privilege permissions and protect data in Azure from accidental deletions, follow these steps:

  1. Log in to your Microsoft Azure cloud account portal and launch Cloud Shell in PowerShell or use the Azure Command-Line Interface (CLI) on your workstation.

  2. Create a service principal and configure it with read-only access to your Azure Blob storage account. Save the JSON output of this command to a local file called azure-principal.json. The file will be uploaded to your EC2 instance. Replace the placeholder variables that are shown in braces ({ and }) with your Azure subscription ID, resource group name, and storage account name.

    az ad sp create-for-rbac ` --name AWS-Rclone-Reader ` --role "Storage Blob Data Reader" ` --scopes /subscriptions/{Subscription ID}/resourceGroups/{Resource Group Name}/providers/Microsoft.Storage/storageAccounts/{Storage Account Name}