Amazon Redshift
Database Developer Guide (API Version 2012-12-01)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.

UNLOAD

Unloads the result of a query to one or more files on Amazon S3, using Amazon S3 server-side encryption (SSE).

Syntax

UNLOAD ('select_statement')
TO 's3://object_path_prefix'
[ WITH ] CREDENTIALS [AS] 'aws_access_credentials' 
[ option [ ... ] ]

where option is

{ MANIFEST
| DELIMITER [ AS ] 'delimiter_char' 
| FIXEDWIDTH [ AS ] 'fixedwidth_spec' }  
| ENCRYPTED
| GZIP     
| ADDQUOTES 
| NULL [ AS ] 'null_string'
| ESCAPE
| ALLOWOVERWRITE
| PARALLEL [ { ON | TRUE } | { OFF | FALSE } ]

Parameters

('select_statement')

Defines a SELECT query. The results of the query are unloaded. In most cases, it is worthwhile to unload data in sorted order by specifying an ORDER BY clause in the query; this approach will save the time required to sort the data when it is reloaded.

The query must be enclosed in single quotes. For example:

('select * from venue order by venueid')

Note

If your query contains quotes (enclosing literal values, for example), you need to escape them in the query text. For example:

('select * from venue where venuestate=\'NV\'')
TO 'S3://object_path/name_prefix'

The full path, including bucket name, to the location on Amazon S3 where Amazon Redshift will write the output file objects, including the manifest file if MANIFEST is specified. The object names are prefixed with name_prefix. For added security, UNLOAD connects to Amazon S3 using an HTTPS connection. By default, UNLOAD writes one or more files per slice. File names are written in the format <object_path>/<name_prefix><slice-number>_part_<file-number>. If MANIFEST is specified, the manifest file is written in the format <object_path>/<name_prefix>.

UNLOAD automatically creates encrypted files using Amazon S3 server-side encryption (SSE). The COPY command automatically reads server-side encrypted files during the load operation. You can transparently download server-side encrypted files from your bucket using either the Amazon S3 management console or APIs. For more information, go to Protecting Data Using Server-Side Encryption.

To use Amazon S3 client-side encryption, specify the ENCRYPT option.

Important

The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.

WITH

This keyword is optional.

CREDENTIALS [AS] 'aws_access_credentials'

The AWS access credentials for the Amazon S3 bucket. The access credentials must belong to an AWS account user or an IAM user with READ and WRITE permission for the Amazon S3 bucket to which the data files are being unloaded.

You can optionally use temporary credentials to access the Amazon S3 bucket. If you use temporary credentials, include the temporary session token in the credentials string. For more information, see Temporary Security Credentials in Usage notes for the COPY command.

The aws_access_credentials string must not contain spaces.

If only access key and secret access key are supplied, the aws_access_credentials string is in the format:

'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'

To use temporary token credentials, you must provide the temporary access key ID, the temporary secret access key, and the temporary token. The aws_access_credentials string is in the format

'aws_access_key_id=<temporary-access-key-id>;aws_secret_access_key=<temporary-secret-access-key>;token=<temporary-token>'

If the ENCRYPTED option is used, the aws_access_credentials string is in the format

'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>;master_symmetric_key=<master_key>'

where <master_key> is the value of the master key that UNLOAD will use to encrypt the files. The master key must be a base64 encoded 256 bit AES symmetric key.

MANIFEST

Creates a manifest file that explicitly lists the data files that are created by the UNLOAD process. The manifest is a text file in JSON format that lists the URL of each file that was written to Amazon S3. The manifest file is written to the same Amazon S3 path prefix as the unload files in the format <object_path_prefix>manifest. For example, if the UNLOAD specifies the Amazon S3 path prefix 'S3://mybucket/venue_', the manifest file location will be 'S3://mybucket/venue_manifest'.

DELIMITER AS 'delimiter_character'

Single ASCII character that is used to separate fields in the output file, such as a pipe character ( | ), a comma ( , ), or a tab ( \t ). The default delimiter is a pipe character. The AS keyword is optional. DELIMITER cannot be used with FIXEDWIDTH. If the data contains the delimiter character, you will need to specify the ESCAPE option to escape the delimiter, or use ADDQUOTES to enclose the data in double quotes. Alternatively, specify a delimiter that is not contained in the data.

FIXEDWIDTH 'fixedwidth_spec'

Unloads the data to a file where each column width is a fixed length, rather than separated by a delimiter. The fixedwidth_spec is a string that specifies the number of columns and the width of the columns. The AS keyword is optional. FIXEDWIDTH cannot be used with DELIMITER. Because FIXEDWIDTH does not truncate data, the specification for each column in the UNLOAD statement needs to be at least as long as the length of the longest entry for that column. The format for fixedwidth_spec is shown below:

'colID1:colWidth1,colID2:colWidth2, ...'

ENCRYPTED

Specifies that the output files on Amazon S3 will be encrypted using Amazon S3 client-side encryption. For more information, see Unloading Encrypted Data Files. UNLOAD automatically creates encrypted files using Amazon S3 server-side encryption with AWS-managed encryption keys (SSE). UNLOAD does not support Amazon S3 server-side encryption with a customer-supplied key (SSE-C). To unload to encrypted gzip-compressed files, add the GZIP option.

GZIP

Unloads data to one or more gzip-compressed file per slice. Each resulting file is appended with a .gz extension.

ADDQUOTES

Places quotation marks around each unloaded data field, so that Amazon Redshift can unload data values that contain the delimiter itself. For example, if the delimiter is a comma, you could unload and reload the following data successfully:

 "1","Hello, World" 

Without the added quotes, the string Hello, World would be parsed as two separate fields.

If you use ADDQUOTES, you must specify REMOVEQUOTES in the COPY if you reload the data.

NULL AS 'null_string'

Specifies a string that represents a null value in unload files. If this option is used, all output files contain the specified string in place of any null values found in the selected data. If this option is not specified, null values are unloaded as:

  • Zero-length strings for delimited output

  • Whitespace strings for fixed-width output

If a null string is specified for a fixed-width unload and the width of an output column is less than the width of the null string, the following behavior occurs:

  • An empty field is output for non-character columns

  • An error is reported for character columns

ESCAPE

For CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:

  • Linefeed: \n

  • Carriage return: \r

  • The delimiter character specified for the unloaded data.

  • The escape character: \

  • A quote character: " or ' (if both ESCAPE and ADDQUOTES are specified in the UNLOAD command).

Important

If you loaded your data using a COPY with the ESCAPE option, you must also specify the ESCAPE option with your UNLOAD command to generate the reciprocal output file. Similarly, if you UNLOAD using the ESCAPE option, you will need to use ESCAPE when you COPY the same data.

ALLOWOVERWRITE

By default, UNLOAD fails if it finds files that it would possibly overwrite. If ALLOWOVERWRITE is specified, UNLOAD will overwrite existing files, including the manifest file.

PARALLEL

By default, UNLOAD writes data in parallel to multiple files, according to the number of slices in the cluster. The default option is ON or TRUE. If PARALLEL is OFF or FALSE, UNLOAD writes to one or more data files serially, sorted absolutely according to the ORDER BY clause, if one is used. The maximum size for a data file is 6.2 GB. So, for example, if you unload 13.4 GB of data, UNLOAD creates the following three files.

s3://mybucket/key000    6.2 GB
s3://mybucket/key001    6.2 GB
s3://mybucket/key002    1.0 GB

Note

The UNLOAD command is designed to use parallel processing. We recommend leaving PARALLEL enabled for most cases, especially if the files will be used to load tables using a COPY command.

Usage Notes

Using ESCAPE for All Delimited UNLOAD Operations

When you UNLOAD using a delimiter and there is any possibility that your data includes the delimiter or any of the characters listed in the ESCAPE option description, you must use the ESCAPE option with the UNLOAD statement. If you do not use the ESCAPE option with the UNLOAD, subsequent COPY operations using the unloaded data might fail.

Important

We strongly recommend that you always use ESCAPE with both UNLOAD and COPY statements unless you are certain that your data does not contain any delimiters or other characters that might need to be escaped.

Loss of Floating-Point Precision

You might encounter loss of precision for floating-point data that is successively unloaded and reloaded.

Limit Clause

The SELECT query cannot use a LIMIT clause in the outer SELECT. For example, the following UNLOAD statement will fail:

unload ('select * from venue limit 10') 
to 's3://mybucket/venue_pipe_' credentials 
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'; 

Instead, use a nested LIMIT clause. For example:

unload ('select * from venue where venueid in 
(select venueid from venue order by venueid desc limit 10)') 
to 's3://mybucket/venue_pipe_' credentials 
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';

Alternatively, you could populate a table using SELECT…INTO or CREATE TABLE AS using a LIMIT clause, then unload from that table.