| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
Topics
Amazon Elastic MapReduce (Amazon EMR) uses Amazon Machine Images (AMIs) to initialize the EC2 instances it launches to run a cluster. The AMIs contain the Linux operating system, Hadoop, and other software used to run the cluster. These AMIs are specific to Amazon EMR and can be used only in the context of running a cluster. Periodically, Amazon EMR updates these AMIs with new versions of Hadoop and other software, so users can take advantage of improvements and new features.
For general information about AMIs, go to Using AMIs in the Amazon Elastic Compute Cloud User Guide. For details about the software versions included in the Amazon EMR AMIs, go to the section called “AMI Versions Supported in Amazon EMR”.
If your application depends on a specific version or configuration of Hadoop, you might want delay upgrading to the new AMI until you have tested your application on it. AMI versioning gives you the option to specify which AMI version your cluster uses to launch EC2 instances.
Specifying the AMI version during cluster creation is optional; if you do not provide an AMI-version parameter, and you are using the CLI, your clusters will run on the most recent AMI version. This means you always have the latest software running on your clusters, but you must ensure that your application will work with new changes as they are released.
If you specify an AMI version when you create a cluster, your instances will be created using that AMI. This provides stability for long-running or mission-critical applications. The trade-off is that your application will not have access to new features on more up-to-date AMI versions.
AMI version numbers are composed of three parts major-version.minor-version.patch. The current version of the Amazon EMR CLI provides three ways to specify which version of the AMI to use to launch your cluster.
Fully specified—If you specify the AMI version using all three parts (e.g. --ami-version 2.0.1) your cluster will be launched on exactly that version. The preceding example would launch a cluster using AMI 2.0.1. This is useful if you are running an application that depends on a specific AMI version and you want to ensure that AMI version is the one used to launch your clusters. The downside is you will not benefit from new features and improvements that are released on subsequent AMIs.
Major-minor version specified—If you specify just the major and minor version for the AMI (e.g. --ami-version 2.0), your cluster will be launched on the AMI that matches those specifications and which has the latest patches. The preceding example would launch a cluster using AMI 2.0.4, since .4 is the latest patch for the 2.0 AMI series that is not deprecated. This scenario ensures a measure of stability in the AMI version, while ensuring that you receive the benefits of new patches and bug releases.
Latest version specified—If you use the keyword latest instead of a version number for the AMI (e.g. --ami-version latest) the cluster will be launched with the latest version available. At this writing, the preceding example would launch a cluster using AMI 2.1.1, because that is the latest version currently available. This is the most dynamic way to run your clusters, as AMIs are updated regularly. This configuration is best for prototyping and testing and is not recommended for production environments.
If you don't specify the AMI and Hadoop versions for the cluster, Amazon EMR launches your cluster with default versions. The default versions returned depend on the interface you use to launch the cluster.
Note
The default AMI is unavailable in the Asia Pacific (Sydney) Region. Instead, use the --ami-version latest keyword to specify the latest AMI for that region instead.
| Interface | Default AMI and Hadoop versions |
|---|---|
| Amazon EMR console | latest AMI and Hadoop versions |
| API | AMI 1.0, Hadoop 0.18 |
| SDK | AMI 1.0, Hadoop 0.18 |
| CLI (version 2012-07-30) and later | latest AMI and Hadoop versions |
| CLI (versions 2011-12-08 to 2012-07-09) | AMI 2.1.3, Hadoop 0.20.205 |
| CLI (version 2011-12-11 and earlier) | AMI 1.0, Hadoop 0.18 |
You can specify which AMI version a new cluster should use when you create it. For details about the default configuration and applications available on AMI versions, see AMI Versions Supported in Amazon EMR.
Note
AMI versioning is not currently supported in the Amazon EMR console. Clusters created through the Amazon EMR console will use the latest version available.
To specify an AMI version using the CLI
When creating a cluster using the CLI, add
the --ami-version parameter. If you
do not specify this parameter, or if you specify --ami-version latest the most
recent version of AMI will be used.
The following example specifies the AMI completely and will launch a cluster on AMI 2.0.1.
In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --alive --name "Static AMI Version" \
--ami-version 2.0.1 \
--num-instances 5 --instance-type m1.small Windows users:
ruby elastic-mapreduce --create --alive --name "Static AMI Version" --ami-version 2.0.1 --num-instances 5 --instance-type m1.small The following example specifies the AMI using just the major and minor version. It will launch the cluster on the AMI that matches those specifications and which has the latest patches. This example would launch a cluster using AMI 2.0.5, since .5 is the latest patch for the 2.0 AMI series.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --alive --name "Major-Minor AMI Version" \
--ami-version 2.0 \
--num-instances 5 --instance-type m1.small Windows users:
ruby elastic-mapreduce --create --alive --name "Major-Minor AMI Version" --ami-version 2.0 --num-instances 5 --instance-type m1.small The following example specifies that the cluster should be launched with the most current version available. At this writing, this example would launch a cluster using AMI 2.2.0, because that is the latest version currently available.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --alive --name "Latest AMI Version" \
--ami-version latest \
--num-instances 5 --instance-type m1.small Windows users:
ruby elastic-mapreduce --create --alive --name "Latest AMI Version" --ami-version latest --num-instances 5 --instance-type m1.small To specify an AMI version using the API
When creating a cluster using the API, add
the AmiVersion and the HadoopVersion parameters to the request string, as shown in the following example. If you do not
specify these parameters, Amazon EMR will create the cluster using the version 1.0 AMI and Hadoop 0.20.
For more information, go to RunJobFlow
in the Amazon Elastic MapReduce API Reference.
https://elasticmapreduce.amazonaws.com?Operation=RunJobFlow &Name=MyJobFlowName &LogUri=s3n%3A%2F%2Fmybucket%2Fsubdir &AmiVersion=1.0&HadoopVersion=0.20&Instances.MasterInstanceType=m1.small &Instances.SlaveInstanceType=m1.small &Instances.InstanceCount=4 &Instances.Ec2KeyName=myec2keyname &Instances.Placement.AvailabilityZone=us-east-1a &Instances.KeepJobFlowAliveWhenNoSteps=true &Steps.member.1.Name=MyStepName &Steps.member.1.ActionOnFailure=CONTINUE &Steps.member.1.HadoopJarStep.Jar=MyJarFile &Steps.member.1.HadoopJarStep.MainClass=MyMainClass &Steps.member.1.HadoopJarStep.Args.member.1=arg1 &Steps.member.1.HadoopJarStep.Args.member.2=arg2 &AuthParams
If you need to find out which AMI version a cluster is running, you can retrieve this information using the console, the CLI, or the API.
To check the current AMI version using the console
Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.
Click on a cluster. The Ami Version and other details about the cluster are displayed in the navigational pane that appears.

To check the current AMI version using the CLI
Use the --describe parameter to retrieve the AMI version on a cluster. In the
following example JobFlowID is the identifier of
the cluster. The AMI version will be returned along with other information about the cluster.
In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --describe -–jobflow JobFlowIDWindows users:
ruby elastic-mapreduce --describe -–jobflow JobFlowIDTo check the current AMI version using the API
Call DescribeJobFlows to check which AMI version a cluster is using.
The version will be returned as part of the response data, as shown in the following example.
For the complete response syntax, go to
DescribeJobFlows
in the Amazon Elastic MapReduce API Reference.
<DescribeJobFlowsResponse xmlns="http://elasticmapreduce.&api-domain;/doc/2009-03-31">
<DescribeJobFlowsResult>
<JobFlows>
<member>
...
<AmiVersion>
2.1.3
</AmiVersion>
...
</member>
</JobFlows>
</DescribeJobFlowsResult>
<ResponseMetadata>
<RequestId>
9cea3229-ed85-11dd-9877-6fad448a8419
</RequestId>
</ResponseMetadata>
</DescribeJobFlowsResponse>
An AMI can contain multiple versions of Hadoop. If the AMI you specify has multiple versions of Hadoop available, you can select the version of Hadoop you want to run as described in Hadoop Configuration Reference. You cannot specify a Hadoop version that is not available on the AMI. For a list of the versions of Hadoop supported on each AMI, go to AMI Versions Supported in Amazon EMR.
Eighteen months after an AMI version is released, the Amazon EMR team might choose to deprecate that AMI version and no longer support it. In addition, the Amazon EMR team might deprecate an AMI before eighteen months has elapsed if a security risk or other issue is identified in the software or operating system of the AMI. If a cluster is running when its AMI is depreciated, the cluster will not be affected. You will not, however, be able to create new clusters with the deprecated AMI version. The best practice is to plan for AMI obsolescence and move to new AMI versions as soon as is practical for your application.
Before an AMI is deprecated, the Amazon EMR team will send out an announcement specifying the date on which the AMI version will no longer be supported.
Amazon EMR supports the AMI versions listed in the following table. You can specify the AMI version to use when you create a cluster. If you do not specify an AMI version, Amazon EMR creates the cluster using the default AMI version. For information about default AMI configurations, see Default AMI and Hadoop Versions.
| AMI Version | Description | Release Date |
|---|---|---|
| 2.3.6 |
Same as 2.3.5, with the following additions:
| 17 May 2013 |
| 2.3.5 |
Same as 2.3.3, with the following additions:
| 26 April 2013 |
| 2.3.4 |
Deprecated | 16 April 2013 |
| 2.3.3 |
Same as 2.3.2, with the following additions:
| 01 March 2013 |
| 2.3.2 |
Same as 2.3.1, with the following additions:
| 07 February 2013 |
| 2.3.1 |
Same as 2.3.0, with the following additions:
| 24 December 2012 |
| 2.3.0 |
Same as 2.2.4, with the following additions:
| 20 December 2012 |
| 2.2.4 |
Same as 2.2.3, with the following additions:
| 6 December 2012 |
| 2.2.3 |
Same as 2.2.1, with the following additions:
| 30 November 2012 |
| 2.2.2 |
Deprecated | 23 November 2012 |
| 2.2.1 |
Same as 2.2.0, with the following additions:
| 30 August 2012 |
| 2.2.0 |
Same as 2.1.3, with the following additions:
Operating system: Debian 6.0.5 (Squeeze) Applications: Hadoop 1.0.3, Hive 0.8.1.3, Pig 0.9.2.2, HBase 0.92.0 Languages: Perl 5.10.1, PHP 5.3.3, Python 2.6.6, R 2.11.1, Ruby 1.8.7 File system: ext3 for root, xfs for ephemeral Kernel: Amazon Linux | 6 August 2012 |
| 2.1.4 |
Same as 2.1.3, with the following additions:
| 30 August 2012 |
| 2.1.3 |
Same as 2.1.2, with the following additions:
| 6 August 2012 |
| 2.1.2 |
Same as 2.1.1, with the following additions:
| 6 August 2012 |
| 2.1.1 |
Same as 2.1.0, with the following additions:
| 3 July 2012 |
| 2.1.0 |
Same as AMI 2.0.5, with the following additions:
| 12 June 2012 |
| 2.0.5 |
Note Because of an issue with AMI 2.0.5, this version is deprecated. We recommend that you use a different AMI version instead. Same as AMI 2.0.4, with the following additions:
| 19 April 2012 |
| 2.0.4 |
Same as AMI 2.0.3, with the following additions:
| 30 January 2012 |
| 2.0.3 |
Same as AMI 2.0.2, with the following additions:
| 24 January 2012 |
| 2.0.2 |
Same as AMI 2.0.1, with the following additions:
| 17 January 2012 |
| 2.0.1 |
Same as AMI 2.0 except for the following bug fixes:
| 19 December 2011 |
| 2.0.0 |
Operating system: Debian 6.0.2 (Squeeze) Applications: Hadoop 0.20.205, Hive 0.7.1, Pig 0.9.1 Languages: Perl 5.10.1, PHP 5.3.3, Python 2.6.6, R 2.11.1, Ruby 1.8.7 File system: ext3 for root, xfs for ephemeral Kernel: Amazon Linux Note: Added support for the Snappy compression/decompression library. | 11 December 2011 |
| 1.0.1 |
Same as AMI 1.0 except for the following change:
| 3 April 2012 |
| 1.0.0 |
Operating system: Debian 5.0 (Lenny) Applications: Hadoop 0.20 and 0.18 (default); Hive 0.5, 0.7 (default), 0.7.1; Pig 0.3 (on Hadoop 0.18), 0.6 (on Hadoop 0.20) Languages: Perl 5.10.0, PHP 5.2.6, Python 2.5.2, R 2.7.1, Ruby 1.8.7 File system: ext3 for root and ephemeral Kernel: Red Hat Note: This was the last AMI released before the CLI was updated to support AMI versioning. For backward compatibility, job flows launched with versions of the CLI downloaded before 11 December 2011 use this version. | 26 April 2011 |
Note
The cc2.8xlarge instance type is supported only on AMI 2.0.0 or later. The hi1.4xlarge and hs1.8xlarge instance types are supported only on AMI 2.3 or later.