Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Supported Hadoop Versions

Amazon Elastic MapReduce (Amazon EMR) allows you to choose which version of Hadoop to run. You do this using the CLI and setting the --ami-version as shown in the following table. We recommend using the latest version of Hadoop to take advantage of performance enhancements and new functionality.

Note

The AMI version determines the Hadoop version and the --hadoop-version parameter is no longer supported.

Hadoop VersionConfiguration Parameters
2.4.0 --ami-version 3.1.0 | 3.1.1 | 3.1.2 | 3.2.0 | 3.2.1
2.2.0 --ami-version 3.0.4 | 3.0.3 | 3.0.2 | 3.0.1
1.0.3 --ami-version 2.4.5 | 2.4.3 | 2.4.2
0.20.205 --ami-version 2.1.4
0.20 --ami-version 1.0

For details about the default configuration and software available on AMIs used by Amazon Elastic MapReduce (Amazon EMR) see Choose an Amazon Machine Image (AMI).

Note

The Asia Pacific (Sydney) Region and AWS GovCloud (US) support only Hadoop 1.0.3 and later. AWS GovCloud (US) additionally requires AMI 2.3.0 and later.

To specify the Hadoop version using the AWS CLI

  • To specify the Hadoop version using the AWS CLI, type the create-cluster subcommand with the --ami-version parameter. The AMI version determines the version of Hadoop for Amazon EMR to use. For details about the version of Hadoop available on an AMI, see AMI Versions Supported in Amazon EMR.

    aws emr create-cluster --ami-version string \
    --instance-count integer --instance-type string

    For example, the following command launches a cluster running Hadoop 2.4.0 using AMI version 3.1.0:

    aws emr create-cluster --ami-version 3.1.0 \
    --instance-count 5 --instance-type m3.xlarge

    Note

    When you specify the instance count without using the --instance-groups parameter, a single Master node is launched, and the remaining instances are launched as core nodes. All nodes will use the instance type specified in the command.

    For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.

To specify the Hadoop version using the Amazon EMR CLI

Note

The Amazon EMR CLI is no longer under feature development. Customers are encouraged to use the Amazon EMR commands in the AWS CLI instead.

  • Add the --ami-version option and specify the version number. The AMI version determines the version of Hadoop for Amazon EMR to use. The following example creates a waiting cluster running Hadoop 2.4.0. Amazon EMR then launches the appropriate AMI for that version of Hadoop. For details about the version of Hadoop available on an AMI, see AMI Versions Supported in Amazon EMR.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "Test Hadoop" \
      --ami-version 3.1.0 \
      --num-instances 5 --instance-type m1.large
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "Test Hadoop" --ami-version 3.0.1 --num-instances 5 --instance-type m1.large