Amazon Redshift
Database Developer Guide (API Version 2012-12-01)

Loading Data from Amazon EMR

You can use the COPY command to load data in parallel from an Amazon EMR cluster configured to write text files to the cluster's Hadoop Distributed File System (HDFS) in the form of fixed-width files, character-delimited files, CSV files, or JSON-formatted files.

Amazon EMR provides a bootstrap action for output to Amazon Redshift that performs much of the preparation work for you. The bootstrap action must be specified when the Amazon EMR cluster is created. The Amazon Redshift bootstrap action is not available for Amazon EMR clusters created using the following AMI versions: 2.1.4, 2.2.4, 2.3.6.

You will follow different procedures to load data from an Amazon EMR cluster, depending on whether or not you choose to use the Amazon Redshift bootstrap action. Follow the steps in one of the following sections.