Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Upgrading to Hadoop 1.0

This section describes how to upgrade your Amazon Elastic MapReduce (Amazon EMR) deployment to Hadoop 1.0.3.

Note

The following information applies to Hadoop 0.20 and later, including Hadoop 1.0.3.

Many Hadoop jobs that run successfully on Hadoop 0.18 run without modification on Hadoop 0.20 and later. However, before you engage in a full upgrade, we recommend recompiling your Hadoop jobs against Hadoop 1.0.3 and testing on small subsets of your data.

Streaming jobs should also work without modification, but we recommend using the new streaming parameters introduced with version 0.20. These are summarized in the following table.

Hadoop 0.18 Hadoop 0.20 Type
-cacheFile -files Comma separated URIs
-cacheArchive -archives Comma separated URIs
-jobconf -D key=value


When using Amazon EMR with Hadoop 0.20 and later we offer the additional guidance listed below:

  • You should recompile cascading applications with the Hadoop 1.0.3 version specified so they can take advantage of the new features available in this version.

  • Full support provided for Pig scripts.

  • All Amazon EMR sample applications are compatible. The Amazon EMR console only supports Hadoop 1.0.3, so samples default to 1.0.3 once launched.