Menu
Amazon Redshift
Database Developer Guide (API Version 2012-12-01)

COPY

Loads data into a table from data files or from an Amazon DynamoDB table. The files can be located in an Amazon Simple Storage Service (Amazon S3) bucket, an Amazon Elastic MapReduce (Amazon EMR) cluster, or a remote host that is accessed using a Secure Shell (SSH) connection.

The COPY command appends the new input data to any existing rows in the table.

The maximum size of a single input row from any source is 4 MB.

Note

To use the COPY command, you must have INSERT privilege for the Amazon Redshift table.

COPY Syntax

COPY table-name 
[ column-list ]
FROM data_source
[ WITH ] CREDENTIALS [AS] 'aws-auth-args'
[ [ FORMAT ] [ AS ] data_format ] 
[ [ parameter [ argument ] [, ... ] ]

COPY Syntax Overview

You can perform a COPY operation as few as three parameters: a table name, a data source, and authorization to access the data.

Amazon Redshift extends the functionality of the COPY command to enable you to load data in several data formats from multiple data sources, control access to load data, manage data transformations, and manage the load operation.

This section presents the required COPY command parameters and groups the optional parameters by function. Subsequent topics describe each parameter and explain how various options work together. You can also go directly to a parameter description by using the alphabetical parameter list.

Required Parameters

The COPY command requires only three parameters:

The simplest COPY command uses the following format.

COPY table-name 
FROM data-source
CREDENTIALS 'aws-auth-args';

The following example creates a table named CATDEMO, and then loads the data with sample data from a data file in Amazon S3 named category_pipe.txt.

create table catdemo(catid smallint, catgroup varchar(10), catname varchar(10), catdesc varchar(50));

In the follow example, the data source for the COPY command is a data file named category_pipe.txt in the tickit folder of an Amazon S3 bucket named awssampledbuswest2. The COPY command is authorized to access the Amazon S3 bucket through an AWS Identity and Access Management (IAM) role. If your cluster has an existing IAM role with permission to access Amazon S3 attached, you can substitute your role's Amazon Resource Name (ARN) in the following COPY command and execute it.

copy catdemo
from 's3://awssampledbuswest2/tickit/category_pipe.txt'
credentials 'aws_iam_role=arn:aws:iam::<aws-account-id>:role/<role-name>'
region 'us-west-2';

For steps to create an IAM role, see Step 2: Create an IAM Role . For complete instructions on how to use COPY commands to load sample data, including instructions for loading data from other AWS regions, see Step 4: Load Sample Data.

table-name

The name of the target table for the COPY command. The table must already exist in the database. The table can be temporary or persistent. The COPY command appends the new input data to any existing rows in the table.

FROM data-source

The location of the source data to be loaded into the target table.

The most commonly used data repository is an Amazon S3 bucket. You can also load from data files located in an Amazon EMR cluster, an Amazon EC2 instance, or a remote host that your cluster can access using an SSH connection, or you can load directly from a DynamoDB table.

CREDENTIALS 'aws-auth-args'

A clause that indicates the method that your cluster will use for authentication and authorization to access other AWS resources. The COPY command needs authorization to access data in another AWS resource, including Amazon S3, Amazon EMR, Amazon DynamoDB, and Amazon EC2. You can provide that authorization by referencing an IAM role that is attached to your cluster or by providing the access key ID and secret access key for an IAM user.

If your data is encrypted on Amazon S3 using client-side encryption, you supply the encryption as part of the aws-auth-args parameter string.

Optional Parameters

You can optionally specify how COPY will map field data to columns in the target table, define source data attributes to enable the COPY command to correctly read and parse the source data, and manage which operations the COPY command performs during the load process.

Column Mapping

By default, COPY inserts field values into the target table's columns in the same order as the fields occur in the data files. If the default column order will not work, you can specify a column list or use JSONPath expressions to map source data fields to the target columns.

Data Format Parameters

You can load data from text files in fixed-width, character-delimited, comma-separated values (CSV), or JSON format, or from Avro files. 

By default, the COPY command expects the source data to be in character-delimited UTF-8 text files. The default delimiter is a pipe character ( | ). If the source data is in another format, use the following parameters to specify the data format.

Data Conversion Parameters

As it loads the table, COPY attempts to implicitly convert the strings in the source data to the data type of the target column. If you need to specify a conversion that is different from the default behavior, or if the default conversion results in errors, you can manage data conversions by specifying the following parameters.

Data Load Operations

Manage the default behavior of the load operation for troubleshooting or to reduce load times by specifying the following parameters.