Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Hadoop Version Behavior

The version of Hive and Pig you have installed on your cluster depends on the Hadoop version installed on your cluster. For Hadoop version 1.0.3, Hive version 0.8.1 and Pig version 0.9.2 is used. For Hadoop version 0.20.205, Hive version 0.7.1 and Pig version 0.9.1 is used. For Hadoop version 0.20, Hive version 0.5 and version Pig 0.6 is used. For Hadoop version 0.18, Hive version 0.4 and Pig version 0.3 is used. The version can be selected by setting HadoopVersion in JobFlowInstancesConfig.

The Amazon EMR console supports Hadoop 1.0.3 with Hive 0.8.1 and Pig 0.9.2.

The default version of Hadoop for the Amazon EMR console, and the command line interface is Hadoop 1.0.3 with Hive 0.8.1 and Pig 0.9.2. You can continue running Hadoop 0.18 with Hive 0.4 for the remainder of the Hadoop 0.18 lifecycle. Additional versions of Hive are available on the command line interface through Hive versioning, for more information, go to Supported Hive Versions

For all clusters run from the Amazon EMR APIs or Java SDK, the default version of Hadoop is 0.18 with Hive 0.4 and Pig 0.3. This is to maintain compatibility with existing libraries and systems. You can continue running Hadoop 0.18 with Hive 0.4 and Pig 0.3 from the Amazon EMR API or Java SDK for the remainder of the Hadoop 0.18 lifecycle, but you should consider upgrading as soon as possible to take advantage of the features and performance improvements found in Hadoop 1.0.3, Hive 0.8.1, and Pig 0.9.2.

For more information, see Default AMI and Hadoop Versions.

You can choose to continue running Hadoop 0.18 with Hive 0.4 using either the command line interface or the Amazon EMR API with the HadoopVersion in the RunJobFlow function. This parameter accepts values 0.18, 0.20, 0.20.205, and 1.0.3 We have regenerated the client libraries to support the new API. Old clients and libraries continue to default to Hadoop 0.18. If you update to the new clients and want to run Hadoop 0.18, you must to explicitly specify the version 0.18 in your requests.

The CLI defaults to run Hadoop 1.0.3. In order to run Hadoop version 0.18 you can either use an earlier version of the Ruby client, or specify –HadoopVersion=0.18 when creating the clusters. As with other options in the command line client, you can specify the –HadoopVersion parameter in your .credentials file.