Step 4: Load data from Amazon S3 to Amazon Redshift - Amazon Redshift

Step 4: Load data from Amazon S3 to Amazon Redshift

After creating your cluster, you can load data from Amazon S3 to your database tables. There are multiple ways to load data from Amazon S3.

  • You can use a SQL client to run the SQL CREATE TABLE command to create a table in your database and then use the SQL COPY command to load data from Amazon S3. The Amazon Redshift query editor v2 is a SQL client.

  • You can use the Amazon Redshift query editor v2 load wizard.

This tutorial first demonstrates how to use the Amazon Redshift query editor v2 to run SQL commands to CREATE tables and COPY data. Launch Query editor v2 from the Amazon Redshift console navigation pane. Within query editor v2 create a connection to examplecluster cluster and database named dev with your admin user awsuser. For this tutorial choose Temporary credentials using a database user name when you create the connection. For details on using Amazon Redshift query editor v2, see Connecting to an Amazon Redshift database in the Amazon Redshift Management Guide.

Loading data from Amazon S3 using SQL commands

On the query editor v2 query editor pane, confirm you are connected to the examplecluster cluster and dev database. Next, create tables in the database and load data to the tables. For this tutorial, the data that you load is available in an Amazon S3 bucket accessible from many AWS Regions.

The following procedure creates tables and loads data from a public Amazon S3 bucket.

Use the Amazon Redshift query editor v2 to copy and run the following create table statement to create a table in the public schema of the dev database. For more information about the syntax, see CREATE TABLE in the Amazon Redshift Database Developer Guide.

To create and load data using a SQL client such as query editor v2
  1. Run the following SQL command to CREATE the sales table.

    drop table if exists sales; create table sales( salesid integer not null, listid integer not null distkey, sellerid integer not null, buyerid integer not null, eventid integer not null, dateid smallint not null sortkey, qtysold smallint not null, pricepaid decimal(8,2), commission decimal(8,2), saletime timestamp);
  2. Run the following SQL command to CREATE the date table.

    drop table if exists date; create table date( dateid smallint not null distkey sortkey, caldate date not null, day character(3) not null, week smallint not null, month character(5) not null, qtr character(5) not null, year smallint not null, holiday boolean default('N'));
  3. Load the sales table from Amazon S3 using the COPY command.

    Note

    We recommend using the COPY command to load large datasets into Amazon Redshift from Amazon S3. For more information about COPY syntax, see COPY in the Amazon Redshift Database Developer Guide.

    Provide authentication for your cluster to access Amazon S3 on your behalf to load the sample data. You provide authentication by referencing the IAM role that you created and set as the default for your cluster when you chose Create IAM role as default when you created the cluster.

    Load the sales table using the following SQL command. You can optionally download and view from Amazon S3 the source data for the sales table. .

    COPY sales FROM 's3://redshift-downloads/tickit/sales_tab.txt' DELIMITER '\t' TIMEFORMAT 'MM/DD/YYYY HH:MI:SS' REGION 'us-east-1' IAM_ROLE default;
  4. Load the date table using the following SQL command. You can optionally download and view from Amazon S3 the source data for the date table. .

    COPY date FROM 's3://redshift-downloads/tickit/date2008_pipe.txt' DELIMITER '|' REGION 'us-east-1' IAM_ROLE default;

Loading data from Amazon S3 using the query editor v2

The query editor v2 simplifies loading data when using the Load data wizard. The COPY command generated and used in the query editor v2 Load data wizard supports many of the parameters available to the COPY command syntax to load data from Amazon S3. For information about the COPY command and its options used to copy load from Amazon S3, see COPY from Amazon Simple Storage Service in the Amazon Redshift Database Developer Guide.

To load your own data from Amazon S3 to Amazon Redshift, Amazon Redshift requires an IAM role that has the required privileges to load data from the specified Amazon S3 bucket.

For this tutorial, first open Amazon Redshift query editor v2 and connect to a database. Next, create the table that will contain the loaded data. Then load your own data from Amazon S3 to Amazon Redshift. For more information on how to work with the query editor v2, see Loading data into a database in the Amazon Redshift Management Guide.

Create TICKIT data in your cluster

You can create the full set of TICKIT tables and load data into your cluster in the following ways:

  • When you create a cluster in the Amazon Redshift console, you have the option to load sample TICKIT data at the same time. On the Amazon Redshift console, choose Clusters, Create cluster. In the Sample data section, select Load sample data Amazon Redshift loads its sample dataset to your Amazon Redshift cluster dev database automatically during cluster creation.

  • With Amazon Redshift query editor v2, you can load TICKIT data into a sample database named sample_data_dev. Navigate to the sample_data_dev database, then to tickit, and from the context menuber choose Open sample notebooks. Amazon Redshift query editor v2 creates the sample database along with a sample notebook named tickit-sample-notebook. You can run this notebook to query data in the sample database.

To view details about the TICKIT data, see Sample database in the Amazon Redshift Database Developer Guide.