Automate data loading from Amazon S3 to Amazon Redshift using AWS Data Pipeline - AWS Prescriptive Guidance

Automate data loading from Amazon S3 to Amazon Redshift using AWS Data Pipeline

Created by: AWS

Environment: PoC or pilot

Technologies: Storage & backup; Databases; Analytics

AWS services: Amazon S3; Amazon Redshift

Summary

This pattern walks you through the AWS data migration process from an Amazon Simple Storage Service (Amazon S3) bucket to Amazon Redshift using AWS Data Pipeline.

Prerequisites and limitations

Prerequisites

  • An active AWS account

  • An S3 source bucket with the right privileges 

Architecture

Source technology stack

  • An S3 bucket with CSV files

Target technology stack

  • An Amazon Redshift cluster

Data migration architecture

Tools

  • Data Pipeline - You can use AWS Data Pipeline to automate the movement and transformation of data. With Data Pipeline, you can define data-driven workflows so that tasks can proceed after the successful completion of previous tasks.

Epics

Task Description Skills required
Validate the version and engine of the target database. DBA
Create an outbound security group to source and target databases. SysAdmin
Task Description Skills required
Create an Amazon Redshift cluster. SysAdmin,DBA
Extract users, roles, and grants list from the source. SysAdmin,DBA
Create users in the target database. SysAdmin,DBA
Apply roles from the previous step to the target database. SysAdmin,DBA
Review database options, parameters, network files, and database links from the source, and evaluate their applicability to the target database. SysAdmin,DBA
Task Description Skills required
Create a new pipeline in AWS Data Pipeline. SysAdmin
For source, choose the option to load data from Amazon S3 into an Amazon Redshift template. SysAdmin
For parameters, provide the source and target details. SysAdmin
Schedule and choose an AWS Data Pipeline activation. SysAdmin
For Security/Access, leave the AWS Identity and Access Management (IAM) roles at their default values. SysAdmin
Activate your pipeline. SysAdmin
Task Description Skills required
Delete the pipeline after data loading or your use case is complete. DBA, SysAdmin, AppOwner