Menu
Amazon Redshift
Database Developer Guide (API Version 2012-12-01)

Tutorial: Loading Data from Amazon S3

In this tutorial, you will walk through the process of loading data into your Amazon Redshift database tables from data files in an Amazon Simple Storage Service (Amazon S3) bucket from beginning to end.

In this tutorial, you will:

  • Download data files that use CSV, character-delimited, and fixed width formats.

  • Create an Amazon S3 bucket and then upload the data files to the bucket.

  • Launch an Amazon Redshift cluster and create database tables.

  • Use COPY commands to load the tables from the data files on Amazon S3.

  • Troubleshoot load errors and modify your COPY commands to correct the errors.

Estimated time: 60 minutes

Estimated cost: $1.00 per hour for the cluster

Prerequisites

You will need the following prerequisites:

  • An AWS account to launch an Amazon Redshift cluster and to create a bucket in Amazon S3.

  • Your AWS credentials (an access key ID and secret access key) to load test data from Amazon S3. If you need to create new access keys, go to Administering Access Keys for IAM Users.

This tutorial is designed so that it can be taken by itself. In addition to this tutorial, we recommend completing the following tutorials to gain a more complete understanding of how to design and use Amazon Redshift databases:

  • Amazon Redshift Getting Started walks you through the process of creating an Amazon Redshift cluster and loading sample data.

  • Tutorial: Tuning Table Design walks you step by step through the process of designing and tuning tables, including choosing sort keys, distribution styles, and compression encodings, and evaluating system performance before and after tuning.

Overview

You can add data to your Amazon Redshift tables either by using an INSERT command or by using a COPY command. At the scale and speed of an Amazon Redshift data warehouse, the COPY command is many times faster and more efficient than INSERT commands.

The COPY command uses the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from multiple data sources. You can load from data files on Amazon S3, Amazon EMR, or any remote host accessible through a Secure Shell (SSH) connection, or you can load directly from an Amazon DynamoDB table.

In this tutorial, you will use the COPY command to load data from Amazon S3. Many of the principles presented here apply to loading from other data sources as well.

To learn more about using the COPY command, see these resources:

Steps