AWS Prescriptive Guidance
Patterns

Automate data loading from Amazon S3 to Amazon Redshift using AWS Data Pipeline

R Type :ReArchitect

source :Databases: Relational

target :Amazon Redshift

tags :amazon s3, amazon redshift, aws data pipeline

categories :Enterprise Applications, Software Infrastructure

Summary

This pattern walks you through the AWS data migration process from an Amazon Simple Storage Service (Amazon S3) bucket to Amazon Redshift using AWS Data Pipeline.

Assumptions and Prerequisites

Prerequisites

  • An active AWS account

  • An S3 source bucket with the right privileges 

Architecture

Source technology stack

  • An S3 bucket with CSV files

Target technology stack

  • An Amazon Redshift cluster

Target architecture

AWS data migration architecture

Tools Used

Data Pipeline - You can use AWS Data Pipeline to automate the movement and transformation of data. With Data Pipeline, you can define data-driven workflows so that tasks can proceed after the successful completion of previous tasks.

Epics

Plan the migration

Tasks

Title Description Skills Predecessor
Validate the version and engine of the target database. DBA
Create an outbound security group to source and target databases. SysAdmin

Prepare the target database

Tasks

Title Description Skills Predecessor
Create an Amazon Redshift cluster. SysAdmin,DBA
Extract users, roles, and grants list from the source. SysAdmin,DBA
Create users in the target database. SysAdmin,DBA
Apply roles from the previous step to the target database. SysAdmin,DBA
Review database options, parameters, network files, and database links from the source, and evaluate their applicability to the target database. SysAdmin,DBA

Configure the pipeline

Tasks

Title Description Skills Predecessor
Create a new pipeline in AWS Data Pipeline. SysAdmin
For source, choose the option to load data from Amazon S3 into an Amazon Redshift template. SysAdmin
For parameters, provide the source and target details. SysAdmin
Schedule and choose an AWS Data Pipeline activation. SysAdmin
For Security/Access, leave the AWS Identity and Access Management (IAM) roles at their default values. SysAdmin
Activate your pipeline. SysAdmin

Cut over

Tasks

Title Description Skills Predecessor
Delete the pipeline after data loading or your use case is complete. DBA, SysAdmin, AppOwner

References and Help

References

Contact and help

Migration Pattern Library Support: aws-mpl@amazon.commailto:aws-mpl@amazon.commailto:aws-mpl@amazon.com