Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Availability Zones and Regions

When you launch a cluster, you have the option to specify a region and an Availability Zone within that region.

If you do not specify an Availability Zone when you launch a cluster, Amazon Elastic MapReduce (Amazon EMR) selects the Availability Zone with lowest Spot Instance pricing and the largest available capacity of EC2 instance types specified for your core instance group, and then launches the master, core, and task instance groups in that Availability Zone.

Because of fluctuating Spot Prices between Availability Zones, selecting the Availability Zone with the lowest initial price (or allowing Amazon EMR to select it for you) might not result in the lowest price for the life of the cluster. For optimal results, you should study the history of Availability Zone pricing before choosing the Availability Zone for your cluster.

Note

Because Amazon EMR selects the Availability Zone based on free capacity of EC2 instance type you specified for the core instance group, your cluster may end up in an Availability Zone with less capacity in other EC2 instance types. For example, if you are launching your core instance group as Large and the master instance group as Extra Large, you may launch into an Availability Zone with insufficient unused Extra Large capacity to fulfill a Spot Instance request for your master node. If you run into this situation, you can launch the master instance group as on-demand, even if you are launching the core instance group as Spot Instances.

If you specify an Availability Zone for the cluster, Amazon EMR launches all of the instance groups in that Availability Zone.

All instance groups in a cluster are launched into a single Availability Zone, regardless of whether they are on-demand or Spot Instances. The reason for using a single Availability Zone is additional data transfer costs and performance overhead make running instance groups in multiple Availability Zones undesirable.

Note

Selecting the Availability Zone is currently not available in the Amazon EMR console. Amazon EMR assigns an Availability Zone to clusters launched from the Amazon EMR console as described above.