Copy Data to Amazon Redshift Using the Command Line - AWS Data Pipeline

Copy Data to Amazon Redshift Using the Command Line

This tutorial demonstrates how to copy data from Amazon S3 to Amazon Redshift. You'll create a new table in Amazon Redshift, and then use AWS Data Pipeline to transfer data to this table from a public Amazon S3 bucket, which contains sample input data in CSV format. The logs are saved to an Amazon S3 bucket that you own.

Amazon S3 is a web service that enables you to store data in the cloud. For more information, see the Amazon Simple Storage Service User Guide. Amazon Redshift is a data warehouse service in the cloud. For more information, see the Amazon Redshift Management Guide.

Prerequisites

Before you begin, you must complete the following steps:

  1. Install and configure a command line interface (CLI). For more information, see Accessing AWS Data Pipeline.

  2. Ensure that the IAM roles named DataPipelineDefaultRole and DataPipelineDefaultResourceRole exist. The AWS Data Pipeline console creates these roles for you automatically. If you haven't used the AWS Data Pipeline console at least once, then you must create these roles manually. For more information, see IAM Roles for AWS Data Pipeline.

  3. Set up the COPY command in Amazon Redshift, since you will need to have these same options working when you perform the copying within AWS Data Pipeline. For information, see Before You Begin: Configure COPY Options and Load Data.

  4. Set up an Amazon Redshift database. For more information, see Set up Pipeline, Create a Security Group, and Create an Amazon Redshift Cluster.