| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
This section describes how to upgrade your Amazon Elastic MapReduce (Amazon EMR) deployment to Hadoop 1.0.3.
Note
The following information applies to Hadoop 0.20 and later, including Hadoop 1.0.3.
Many Hadoop jobs that run successfully on Hadoop 0.18 run without modification on Hadoop 0.20 and later. However, before you engage in a full upgrade, we recommend recompiling your Hadoop jobs against Hadoop 1.0.3 and testing on small subsets of your data.
Streaming jobs should also work without modification, but we recommend using the new streaming parameters introduced with version 0.20. These are summarized in the following table.
| Hadoop 0.18 | Hadoop 0.20 | Type |
|---|---|---|
| -cacheFile | -files | Comma separated URIs |
| -cacheArchive | -archives | Comma separated URIs |
| -jobconf | -D | key=value |
When using Amazon EMR with Hadoop 0.20 and later we offer the additional guidance listed below:
You should recompile cascading applications with the Hadoop 1.0.3 version specified so they can take advantage of the new features available in this version.
Full support provided for Pig scripts.
All Amazon EMR sample applications are compatible. The Amazon EMR console only supports Hadoop 1.0.3, so samples default to 1.0.3 once launched.