Copy CSV Data Between Amazon S3 Buckets Using AWS Data Pipeline

After you read What is AWS Data Pipeline? and decide you want to use AWS Data Pipeline to automate the movement and transformation of your data, it is time to get started with creating data pipelines. To help you make sense of how AWS Data Pipeline works, let’s walk through a simple task.

This tutorial walks you through the process of creating a data pipeline to copy data from one Amazon S3 bucket to another and then send an Amazon SNS notification after the copy activity completes successfully. You use an EC2 instance managed by AWS Data Pipeline for this copy activity.

Pipeline Objects

The pipeline uses the following objects:

CopyActivity: The activity that AWS Data Pipeline performs for this pipeline (copy CSV data from one Amazon S3 bucket to another).

Important
There are limitations when using the CSV file format with CopyActivity and S3DataNode. For more information, see CopyActivity.
Schedule: The start date, time, and the recurrence for this activity. You can optionally specify the end date and time.
Ec2Resource: The resource (an EC2 instance) that AWS Data Pipeline uses to perform this activity.
S3DataNode: The input and output nodes (Amazon S3 buckets) for this pipeline.
SnsAlarm: The action AWS Data Pipeline must take when the specified conditions are met (send Amazon SNS notifications to a topic after the task finishes successfully).

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Using the CLI

Before You Begin

Copy CSV Data Between Amazon S3 Buckets Using AWS Data Pipeline

Pipeline Objects

Important

Contents