Menu
Amazon Redshift
Database Developer Guide (API Version 2012-12-01)

COPY from Amazon S3

To load data from files located in one or more S3 buckets, use the FROM clause to indicate how COPY will locate the files in Amazon S3. You can provide the object path to the data files as part of the FROM clause, or you can provide the location of a manifest file that contains a list of Amazon S3 object paths. COPY from Amazon S3 uses an HTTPS connection.

Important

If the Amazon S3 buckets that hold the data files do not reside in the same region as your cluster, you must use the REGION parameter to specify the region in which the data is located.

Syntax

Copy
FROM { 's3://objectpath' | 's3://manifest_file' } authorization | MANIFEST | ENCRYPTED | REGION [AS] 'aws-region' | optional-parameters

Examples

The following example uses an object path to load data from Amazon S3.

Copy
copy customer from 's3://mybucket/customer' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';

The following example uses a manifest file to load data from Amazon S3.

Copy
copy customer from 's3://mybucket/cust.manifest' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' manifest;

Parameters

FROM

The source of the data to be loaded.

's3://copy_from_s3_objectpath'

Specifies the path to the Amazon S3 objects that contain the data—for example, 's3://mybucket/custdata.txt'. The s3://copy_from_s3_objectpath parameter can reference a single file or a set of objects or folders that have the same key prefix. For example, the name custdata.txt is a key prefix that refers to a number of physical files: custdata.txt,custdata.txt.1, custdata.txt.2, custdata.txt.bak,and so on. The key prefix can also reference a number of folders. For example, 's3://mybucket/custfolder' refers to the folders custfolder, custfolder_1, custfolder_2, and so on. If a key prefix references multiple folders, all of the files in the folders will be loaded. If a key prefix matches a file as well as a folder, such as custfolder.log, COPY attempts to load the file also. If a key prefix might result in COPY attempting to load unwanted files, use a manifest file. For more information, see copy_from_s3_manifest_file, following.

Important

If the S3 bucket that holds the data files does not reside in the same region as your cluster, you must use the REGION parameter to specify the region in which the data is located.

For more information, see Loading Data from Amazon S3.

's3://copy_from_s3_manifest_file'

Specifies the Amazon S3 object key for a manifest file that lists the data files to be loaded. The 's3://copy_from_s3_manifest_file' argument must explicitly reference a single file—for example, 's3://mybucket/manifest.txt'. It cannot reference a key prefix.

The manifest is a text file in JSON format that lists the URL of each file that is to be loaded from Amazon S3. The URL includes the bucket name and full object path for the file. The files that are specified in the manifest can be in different buckets, but all the buckets must be in the same region as the Amazon Redshift cluster. If a file is listed twice, the file is loaded twice. The following example shows the JSON for a manifest that loads three files.

Copy
{ "entries": [ {"url":"s3://mybucket-alpha/custdata.1","mandatory":true}, {"url":"s3://mybucket-alpha/custdata.2","mandatory":true}, {"url":"s3://mybucket-beta/custdata.1","mandatory":false} ] }

The double quote characters are required, and must be simple quotation marks (0x22), not slanted or "smart" quotes. Each entry in the manifest can optionally include a mandatory flag. If mandatory is set to true, COPY terminates if it does not find the file for that entry; otherwise, COPY will continue. The default value for mandatory is false.

The manifest file must not be encrypted or compressed, even if the ENCRYPTED, GZIP, LZOP, or BZIP2 options are specified. COPY returns an error if the specified manifest file is not found or the manifest file is not properly formed.

If a manifest file is used, the MANIFEST parameter must be specified with the COPY command. If the MANIFEST parameter is not specified, COPY assumes that the file specified with FROM is a data file.

For more information, see Loading Data from Amazon S3.

authorization

The COPY command needs authorization to access data in another AWS resource, including in Amazon S3, Amazon EMR, Amazon DynamoDB, and Amazon EC2. You can provide that authorization by referencing an AWS Identity and Access Management (IAM) role that is attached to your cluster (role-based access control) or by providing the access credentials for an IAM user (key-based access control). For increased security and flexibility, we recommend using IAM role-based access control. For more information, see Authorization Parameters.

MANIFEST

Specifies that a manifest is used to identify the data files to be loaded from Amazon S3. If the MANIFEST parameter is used, COPY loads data from the files listed in the manifest referenced by 's3://copy_from_s3_manifest_file'. If the manifest file is not found, or is not properly formed, COPY fails. For more information, see Using a Manifest to Specify Data Files.

ENCRYPTED

A clause that specifies that the input files on Amazon S3 are encrypted using client-side encryption with customer-managed symmetric keys (CSE-CMK). For more information, see Loading Encrypted Data Files from Amazon S3. Don't specify ENCRYPTED if the input files are encrypted using Amazon S3 server-side encryption (SSE-KMS or SSE-S3). COPY reads server-side encrypted files automatically.

If you specify the ENCRYPTED parameter, you must also specify the MASTER_SYMMETRIC_KEY parameter or include the master_symmetric_key value in the CREDENTIALS string.

If the encrypted files are in compressed format, add the GZIP, LZOP, or BZIP2 parameter.

MASTER_SYMMETRIC_KEY 'master_key'

The master symmetric key that was used to encrypt data files on Amazon S3. If MASTER_SYMMETRIC_KEY is specified, the ENCRYPTED parameter must also be specified. MASTER_SYMMETRIC_KEY can't be used with the CREDENTIALS parameter. For more information, see Loading Encrypted Data Files from Amazon S3.

If the encrypted files are in compressed format, add the GZIP, LZOP, or BZIP2 parameter.

REGION [AS] 'aws-region'

Specifies the AWS region where the source data is located. REGION is required for COPY from an Amazon S3 bucket or an DynamoDB table when the AWS resource that contains the data is not in the same region as the Amazon Redshift cluster.

The value for aws_region must match a region listed in the following table.

Region Name Region
US East (N. Virginia) us-east-1
US East (Ohio) us-east-2
US West (N. California) us-west-1
US West (Oregon) us-west-2
Asia Pacific (Seoul) ap-northeast-2
Asia Pacific (Mumbai) ap-south-1
Asia Pacific (Singapore) ap-southeast-1
Asia Pacific (Sydney) ap-southeast-2
Asia Pacific (Tokyo) ap-northeast-1
Canada (Central) ca-central-1
China (Beijing) cn-north-1
EU (Frankfurt) eu-central-1
EU (Ireland) eu-west-1
EU (London) eu-west-2
South America (São Paulo) sa-east-1
AWS GovCloud (US) us-gov-west-1

If the REGION parameter is specified, all resources, including a manifest file or multiple Amazon S3 buckets, must be located in the specified region.

Note

Transferring data across regions incurs additional charges against the Amazon S3 bucket or the DynamoDB table that contains the data. For more information about pricing, see Data Transfer OUT From Amazon S3 To Another AWS Region on the Amazon S3 Pricing page and Data Transfer OUT on the Amazon DynamoDB Pricing page.

By default, COPY assumes that the data is located in the same region as the Amazon Redshift cluster.

Optional Parameters

You can optionally specify the following parameters with COPY from Amazon S3:

Unsupported Parameters

You cannot use the following parameters with COPY from Amazon S3:

  • SSH

  • READRATIO