Amazon EMR
Developer Guide

Choose an Amazon Machine Image (AMI)

This documentation is for AMI versions 2.x and 3.x of Amazon EMR. For information about Amazon EMR releases 4.0.0 and above, see the Amazon EMR Release Guide. For information about managing the Amazon EMR service in 4.x releases, see the Amazon EMR Management Guide.

Amazon EMR uses Amazon Machine Images (AMIs) to initialize the EC2 instances it launches to run a cluster. The AMIs contain the Linux operating system and other software used to run the cluster. These AMIs are specific to Amazon EMR and can be used only in the context of running a cluster. Periodically, Amazon EMR updates these AMIs with new versions of applications such as Hadoop and other software, so users can take advantage of improvements and new features. If you create a new cluster using an updated AMI, you must ensure that your custom applications will work with it.

For general information about AMIs, see Amazon Machine Images in the Amazon EC2 User Guide for Linux Instances. For more information about the software versions included in the Amazon EMR AMIs, see AMI Versions Supported in Amazon EMR Versions 2.x and 3.x.

AMI versioning gives you the option to choose the specific AMI your cluster uses to launch EC2 instances. If your application depends on a specific version or configuration of Hadoop, you might want delay upgrading to a new AMI until you have tested your application on it.

Specifying the AMI version during cluster creation is required when you use the console or the AWS CLI and is optional in the API, and SDK. If you specify an AMI version when you create a cluster, your instances will be created using that AMI. This provides stability for long-running or mission-critical applications. The trade-off is that your application will not have access to new features on more up-to-date AMI versions unless you launch a new cluster using a newer AMI. For more information on specifying the AMI version , see AMI Version Numbers (Versions 2.x, 3.x).

In the API and SDK, the AMI version is optional; if you do not provide an AMI version parameter, and you are using the API or SDK, your clusters will run on the default AMI version for the tool you are using.