Using Amazon EMR - AWS Prescriptive Guidance

Using Amazon EMR

This solution is similar to the Data Pipeline solution in that Data Pipeline uses Amazon EMR clusters behind the scenes for the job. The EMR clusters in the source account read from the source Amazon DynamoDB table and write to a destination S3 bucket. The target EMR clusters read from the destination S3 bucket and write to the target DynamoDB table.

To replicate DynamoDB tables using this approach, EMR clusters configured with Apache Hive must be launched in both the source and target accounts. Both EMR clusters must be configured with read/write permissions for the destination S3 bucket.

Advantages

  • The solution provides more options for customization and provides more control over the data migration process.

Drawbacks

  • The process is more involved, because it requires running Hive queries on the source and the target and creating an external table on the S3 location to contain the data.

  • It requires setting up the clusters and terminating them after the completion of the job.