Using the COPY Command to Load from Amazon S3
Use the COPY command to load a table in parallel from data files on Amazon S3. You can specify the files to be loaded by using an Amazon S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as follows:
The manifest file is a JSON-formatted file that lists the data files to be loaded. The syntax to specify the files to be loaded by using a manifest file is as follows:Copy
copy <table_name> from 's3://<bucket_name>/<object_prefix>' authorization;
copy <table_name> from 's3://<bucket_name>/<manifest_file>' authorization manifest;
The table to be loaded must already exist in the database. For information about creating a table, see CREATE TABLE in the SQL Reference.
The values for authorization provide the AWS authorization your cluster needs to access the Amazon S3 objects. For information about required permissions, see IAM Permissions for COPY, UNLOAD, and CREATE LIBRARY. The preferred method for authentication is to specify the IAM_ROLE parameter and provide the Amazon Resource Name (ARN) for an IAM role with the necessary permissions. Alternatively, you can specify the ACCESS_KEY_ID and SECRET_ACCESS_KEY parameters and provide the access key ID and secret access key for an authorized IAM user as plain text. For more information, see Role-Based Access Control or Key-Based Access Control.
To authenticate using the IAM_ROLE parameter, replace
<role-name> as shown in the following syntax.
The following example shows authentication using an IAM role.
copy customer from 's3://mybucket/mydata' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
To authenticate using IAM user credentials, replace
<secret-access-key with an authorized user's access
key ID and full secret access key for the ACCESS_KEY_ID and SECRET_ACCESS_KEY
parameters as shown following.
<access-key-id>' SECRET_ACCESS_KEY '
The following example shows authentication using IAM user credentials.
copy customer from 's3://mybucket/mydata' access_key_id '
<access-key-id>' secret_access_key '
For more information about other authorization options, see Authorization Parameters
If you want to validate your data without actually loading the table, use the NOLOAD option with the COPY command.
The following example shows the first few rows of a pipe-delimited data in a file
1|Toyota Park|Bridgeview|IL|0 2|Columbus Crew Stadium|Columbus|OH|0 3|RFK Stadium|Washington|DC|0
Before uploading the file to Amazon S3, split the file into multiple files so that the COPY command can load it using parallel processing. The number of files should be a multiple of the number of slices in your cluster. Split your load data files so that the files are about equal size, between 1 MB and 1 GB after compression. For more information, see Splitting Your Data into Multiple Files.
For example, the
venue.txt file might be split into four files, as
venue.txt.1 venue.txt.2 venue.txt.3 venue.txt.4
The following COPY command loads the VENUE table using the pipe-delimited data in
the data files with the prefix 'venue' in the Amazon S3 bucket
The Amazon S3 bucket
mybucket in the following examples does not
exist. For sample COPY commands that use real data in an existing Amazon S3 bucket,
see Step 4: Load Sample Data.
copy venue from 's3://mybucket/venue' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' delimiter '|';
If no Amazon S3 objects with the key prefix 'venue' exist, the load fails.