Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Run a Hadoop Application to Process Data

Amazon EMR provides two models for creating custom Hadoop applications to process data:

  • Custom JAR or Cascading cluster — write a Java application, which may or may not make use of the Cascading Java libraries, generate a JAR file, and upload the JAR file to Amazon S3 where it will be imported into the cluster and used to process data. When you do this, your JAR file must contain an implementation for both the map and reduce functionality.

  • Streaming cluster — write separate map and reduce scripts using one of several scripting languages, upload the scripts to Amazon S3, where the scripts are imported into the cluster and used to process data. You can also use built-in Hadoop classes, such as aggregate, instead of providing a script.

Regardless of which type of custom application you create, the application must provide both map and reduce functionality, and should adhere to Hadoop programming best practices.