Synchronize data between Amazon EFS file systems in different AWS Regions by using AWS DataSync - AWS Prescriptive Guidance

Synchronize data between Amazon EFS file systems in different AWS Regions by using AWS DataSync

Created by Sarat Chandra Pothula (AWS) and Aditya Ambati (AWS)

Code repository: aws-efs-crossregion-datasync

Environment: PoC or pilot

Technologies: Infrastructure; Storage & backup

AWS services: AWS CDK; AWS DataSync; Amazon EFS

Summary

This solution provides a robust framework for efficient and secure data synchronization between Amazon Elastic File System (Amazon EFS) instances in different AWS Regions. This approach is scalable and provides controlled, cross-Region data replication. This solution can enhance your disaster recovery and data redundancy strategies.

By using the AWS Cloud Development Kit (AWS CDK), this pattern uses as an infrastructure as code (IaC) approach to deploy the solution resources. The AWS CDK application deploys the essential AWS DataSync, Amazon EFS, Amazon Virtual Private Cloud (Amazon VPC), and Amazon Elastic Compute Cloud (Amazon EC2) resources. This IaC provides a repeatable and version-controlled deployment process that is fully aligned with AWS best practices.

Prerequisites and limitations

Prerequisites

Limitations

  • The solution inherits limitations from DataSync and Amazon EFS, such as data transfer rates, size limitations, and regional availability. For more information, see AWS DataSync quotas and Amazon EFS quotas.

  • This solution supports Amazon EFS only. DataSync supports other AWS services, such as Amazon Simple Storage Service (Amazon S3) and Amazon FSx for Lustre. However, this solution requires modification to synchronize data with these other services.

Architecture

Architecture diagram for replicating data to an EFS file system in a different Region

This solution deploys the following AWS CDK stacks:

  • Amazon VPC stack –­ This stack sets up virtual private cloud (VPC) resources, including subnets, an internet gateway, and a NAT gateway in both the primary and secondary AWS Regions.

  • Amazon EFS stack – This stack deploys Amazon EFS file systems into the primary and secondary Regions and connects them to their respective VPCs.

  • Amazon EC2 stack – This stack launches EC2 instances in the primary and secondary Regions. These instances are configured to mount the Amazon EFS file system, which allows them to access the shared storage.

  • DataSync location stack – This stack uses a custom construct called DataSyncLocationConstruct to create DataSync location resources in the primary and secondary Regions. These resources define endpoints for data synchronization.

  • DataSync task stack – This stack uses a custom construct called DataSyncTaskConstruct to create a DataSync task in the primary Region. This task is configured to synchronize data between the primary and secondary Regions by using the DataSync source and destination locations.

Tools

AWS services

  • AWS Cloud Development Kit (AWS CDK) is a software development framework that helps you define and provision AWS Cloud infrastructure in code.

  • AWS DataSync is an online data transfer and discovery service that helps you move files or object data to, from, and between AWS storage services.

  • Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the AWS Cloud. You can launch as many virtual servers as you need and quickly scale them up or down.

  • Amazon Elastic File System (Amazon EFS) helps you create and configure shared file systems in the AWS Cloud.

  • Amazon Virtual Private Cloud (Amazon VPC) helps you launch AWS resources into a virtual network that you’ve defined. This virtual network resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS.

Code repository

The code for this pattern is available in the GitHub Amazon EFS Cross-Region DataSync Project repository.

Best practices

Follow the best practices described in Best practices for using the AWS CDK in TypeScript to create IaC projects.

Epics

TaskDescriptionSkills required

Clone the project repository.

Enter the following command to clone the Amazon EFS Cross-Region DataSync Project repository.

git clone https://github.com/aws-samples/aws-efs-crossregion-datasync.git
AWS DevOps

Install the npm dependencies.

Enter the following command.

npm ci
AWS DevOps

Choose the primary and secondary Regions.

In the cloned repository, navigate to the src/infa directory. In the Launcher.ts file, update the PRIMARY_AWS_REGION and SECONDARY_AWS_REGION values. Use the corresponding Region codes.

const primaryRegion = { account: account, region: '<PRIMARY_AWS_REGION>' }; const secondaryRegion = { account: account, region: '<SECONDARY_AWS_REGION>' };
AWS DevOps

Bootstrap the environment.

Enter the following command to bootstrap the AWS account and AWS Region that you want to use.

cdk bootstrap <aws_account>/<aws_region>

For more information, see Bootstrapping in the AWS CDK documentation.

AWS DevOps

List the AWS CDK stacks.

Enter the following command to view a list of the AWS CDK stacks in the app.

cdk ls
AWS DevOps

Synthesize the AWS CDK stacks.

Enter the following command to produce an AWS CloudFormation template for each stack defined in the AWS CDK app.

cdk synth
AWS DevOps

Deploy the AWS CDK app.

Enter the following command to deploy all of the stacks to your AWS account, without requiring manual approval for any changes.

cdk deploy --all --require-approval never
AWS DevOps
TaskDescriptionSkills required

Log in to the EC2 instance in the primary Region.

  1. Using Session Manager, a capability of AWS Systems Manager, log in to the EC2 instance in the primary Region. For instructions, see Connect to your Linux instance with AWS Systems Manager Session Manager.

  2. Change directories to the Amazon EFS mount path.

    cd /mnt/efs
AWS DevOps

Create a temporary file.

Enter the following command to create a temporary file in the Amazon EFS mount path.

sudo dd if=/dev/zero \ of=tmptst.dat \ bs=1G \ seek=5 \ count=0 ls -lrt tmptst.dat
AWS DevOps

Start the DataSync task.

Enter the following command to replicate the temporary file from the primary Region to the secondary Region, where <ARN-task> is the Amazon Resource Name (ARN) of your DataSync task.

aws datasync start-task-execution \ --task-arn <ARN-task>

The command returns the ARN of the task execution in the following format.

arn:aws:datasync:<region>:<account-ID>:task/task-execution/<exec-ID>

AWS DevOps

Check the status of the data transfer.

Enter the following command to describe the DataSync execution task, where <ARN-task-execution> is the ARN of the task execution.

aws datasync describe-task-execution \ --task-execution-arn <ARN-task-execution>

The DataSync task is complete when PrepareStatus, TransferStatus, and VerifyStatus all have the value SUCCESS.

AWS DevOps

Log in to the EC2 instance in the secondary Region.

  1. Using Session Manager, a capability of AWS Systems Manager, log in to the EC2 instance in the secondary Region. For instructions, see Connect to your Linux instance with AWS Systems Manager Session Manager.

  2. Change directories to the Amazon EFS mount path.

    cd /mnt/efs
AWS DevOps

Validate the replication.

Enter the following command to verify that the temporary file exists in the Amazon EFS file system.

ls -lrt tmptst.dat
AWS DevOps

Related resources

AWS documentation

Other AWS resources