COPY from columnar data formats
COPY can load data from Amazon S3 in the following columnar formats:
-
ORC
-
Parquet
For examples of using COPY from columnar data formats, see COPY examples.
COPY supports columnar formatted data with the following considerations:
-
The Amazon S3 bucket must be in the same AWS Region as the Amazon Redshift database.
-
To access your Amazon S3 data through a VPC endpoint, set up access using IAM policies and IAM roles as described in Using Amazon Redshift Spectrum with Enhanced VPC Routing in the Amazon Redshift Management Guide.
-
COPY doesn't automatically apply compression encodings.
-
Only the following COPY parameters are supported:
-
ACCEPTINVCHARS when copying from an ORC or Parquet file.
-
-
If COPY encounters an error while loading, the command fails. ACCEPTANYDATE and MAXERROR aren't supported for columnar data types.
Error messages are sent to the SQL client. Some errors are logged in STL_LOAD_ERRORS and STL_ERROR.
-
COPY inserts values into the target table's columns in the same order as the columns occur in the columnar data files. The number of columns in the target table and the number of columns in the data file must match.
-
If the file you specify for the COPY operation includes one of the following extensions, we decompress the data without the need for adding any parameters:
.gz
.snappy
.bz2
COPY from the Parquet and ORC file formats uses Redshift Spectrum and the bucket access. To use COPY for these formats, be sure there are no IAM policies blocking the use of Amazon S3 presigned URLs. The presigned URLs generated by Amazon Redshift are valid for 1 hour so that Amazon Redshift has enough time to load all the files from the Amazon S3 bucket. A unique presigned URL is generated for each file scanned by COPY from columnar data formats. For bucket policies that include an
s3:signatureAge
action, make sure to set the value to at least 3,600,000 milliseconds. For more information, see Using Amazon Redshift Spectrum with enhanced VPC routing.