Amazon EMR provides several models for creating custom Hadoop applications to process data:
Custom JAR or Cascading — write a Java application, which may or may not make use of the Cascading Java libraries, generate a JAR file, and upload the JAR file to Amazon S3 where it will be imported into the cluster and used to process data. When you do this, your JAR file must contain an implementation for both the Map and Reduce functionality.
Streaming — write separate Map and Reduce scripts using
one of several scripting languages, upload the scripts to Amazon S3, where the scripts
are imported into the cluster and used to process data. You can also use built-in
Hadoop classes, such as
aggregate, instead of providing a script.
Regardless of which type of custom application you create, the application must provide both Map and Reduce functionality, and should adhere to Hadoop programming best practices.