Loading data from Amazon S3 - Amazon Redshift

Loading data from Amazon S3

The COPY command leverages the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from files in an Amazon S3 bucket. You can take maximum advantage of parallel processing by splitting your data into multiple files and by setting distribution keys on your tables. For more information about distribution keys, see Working with data distribution styles.

Data from the files is loaded into the target table, one line per row. The fields in the data file are matched to table columns in order, left to right. Fields in the data files can be fixed-width or character delimited; the default delimiter is a pipe (|). By default, all the table columns are loaded, but you can optionally define a comma-separated list of columns. If a table column is not included in the column list specified in the COPY command, it is loaded with a default value. For more information, see Loading default column values.

Follow this general process to load data from Amazon S3:

  1. Split your data into multiple files.

  2. Upload your files to Amazon S3.

  3. Run a COPY command to load the table.

  4. Verify that the data was loaded correctly.

The rest of this section explains these steps in detail.