Using AWS Data Pipeline - AWS Prescriptive Guidance

Using AWS Data Pipeline

AWS Data Pipeline is a web service that you can use to automate the movement and transformation of data. Using Data Pipeline, you can create a pipeline to export table data from the source account. The exported data is stored in an Amazon Simple Storage Service (Amazon S3) bucket in the target account. The S3 bucket in the target account must be accessible from the source account. To allow this cross-account access, update the access control list (ACL) in the target S3 bucket.

Create another pipeline in the target account (Account-B) to import data from the S3 bucket into the table in the target account.

This was the traditional way to back up Amazon DynamoDB tables to Amazon S3 and to restore from Amazon S3 until AWS Glue introduced support for reading from DynamoDB tables natively.

Advantages

  • It's a serverless solution.

  • No new code is required.

  • AWS Data Pipeline uses Amazon EMR clusters behind the scenes for the job, so this approach is efficient and can handle large datasets.

Drawbacks

  • Additional AWS services (Data Pipeline and Amazon S3) are required.

  • The process consumes provisioned throughput on the source table and the target tables involved, so it can affect performance and availability.

  • This approach incurs additional costs, over the cost of DynamoDB read capacity units (RCUs) and write capacity units (WCUs).