Loads data into a table from data files or from an Amazon DynamoDB table. The files can be located in an Amazon Simple Storage Service (Amazon S3) bucket, an Amazon Elastic MapReduce (Amazon EMR) cluster, or a remote host that is accessed using a Secure Shell (SSH) connection.
The COPY command appends the new input data to any existing rows in the table.
The maximum size of a single input row from any source is 4 MB.
To use the COPY command, you must have INSERT privilege for the Amazon Redshift table.
COPY table_name [ column_list ] FROM data_source [ WITH ] CREDENTIALS [AS] 'aws_access_credentials' [ [ FORMAT ] [ AS ] data_format ] [ [ parameter [ argument ] [, ... ] ]
COPY Syntax Overview
You can perform a COPY operation with only three parameters: a table name, an Amazon S3 bucket name, and your AWS credentials.
Amazon Redshift extends the functionality of the COPY command to enable you to load data in several data formats from multiple data sources, control access to load data, manage data transformations, and manage the load operation.
This section presents the required COPY command parameters and groups the optional parameters by function. Subsequent topics describe each parameter and explain how various options work together. You can also go directly to a parameter description by using the alphabetical parameter list.
The COPY command requires only three parameters:
The simplest COPY command uses the following format.
COPY table_name FROM data_source CREDENTIALS 'aws_access_credentials';
The following example creates a table named CATDEMO and then loads the data with
sample data from a data file in Amazon S3 named
create table catdemo(catid smallint, catgroup varchar(10), catname varchar(10), catdesc varchar(50)); copy catdemo from 's3://awssampledbuswest2/tickit/category_pipe.txt' credentials 'aws_access_key_id=
If your cluster is located in the US West (Oregon) AWS region, you can substitute your AWS credentials in the previous COPY command and execute it. For complete instructions on how to load sample data, including instructions for loading data from other AWS regions, see Step 4: Load Sample Data.
The name of the target table for the COPY command. The table must already exist in the database. The table can be temporary or persistent. The COPY command appends the new input data to any existing rows in the table.
- FROM data_source
The location of the source data to be loaded into the target table.
The most commonly used data repository is an Amazon S3 bucket. You can also load from data files located in an Amazon EMR cluster or a remote host that your cluster can access using an SSH connection, or you can load directly from a DynamoDB table.
- CREDENTIALS 'aws_access_credentials'
The COPY command needs valid AWS user credentials to access data in another AWS repository, including Amazon S3, Amazon EMR, and DynamoDB. For increased security and flexibility, we recommend using AWS Identity and Access Management (IAM) credentials. COPY can also use temporary credentials to limit access to your load data, and you can encrypt your load data on Amazon S3.
You can optionally specify how COPY will map field data to columns in the target table, define source data attributes to enable the COPY command to correctly read and parse the source data, and manage which operations the COPY command performs during the load process.
By default, COPY inserts field values into the target table's columns in the same order as the fields occur in the data files. If the default column order will not work, you can specify a column list or use JSONPath expressions to map source data fields to the target columns.
Data Format Parameters
You can load data from text files in fixed-width, character-delimited, comma-separated values (CSV), or JSON format, or from Avro files.
By default, the COPY command expects the source data to be in character-delimited UTF-8 text files. The default delimiter is a pipe character ( | ). If the source data is in another format, use the following parameters to specify the data format.
Data Conversion Parameters
As it loads the table, COPY attempts to implicitly convert the strings in the source data to the data type of the target column. If you need to specify a conversion that is different from the default behavior, or if the default conversion results in errors, you can manage data conversions by specifying the following parameters.
Using the COPY Command
For more information about how to use the COPY command, see the following topics: