Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Choose What to Launch as Spot Instances

When you launch a cluster in Amazon Elastic MapReduce (Amazon EMR), you can choose to launch any or all of the instance groups (master, core, and task) as Spot Instances. Because each type of instance group plays a different role in the cluster, the implications of launching each instance group as Spot Instances vary.

When you launch an instance group either as on-demand or as Spot Instances, you cannot change its classification while the cluster is running. In order to change an on-demand instance group to Spot Instances or vice versa, you must terminate the cluster and launch a new one.

The following table shows launch configurations for using Spot Instances in various applications.

ProjectMaster Instance GroupCore Instance GroupTask Instance Group
Long-running clusterson-demandon-demandspot
Cost-driven workloadsspotspotspot
Data-critical workloadson-demandon-demandspot
Application testingspotspotspot

Master Instance Group as Spot Instances

The master node controls and directs the cluster. When it terminates, the cluster ends, so you should only launch the master node as a Spot Instance if you are running a cluster where sudden termination is acceptable. This might be the case if you are testing a new application, have a cluster that periodically persists data to an external store such as Amazon S3, or are running a cluster where cost is more important than ensuring the cluster’s completion.

When you launch the master instance group as a Spot Instance, the cluster will not start until that Spot Instance request is fulfilled. This is something to take into consideration when selecting your bid price.

You can only add a Spot Instance master node when you launch the cluster. Master nodes cannot be added or removed from a running cluster.

Typically, you would only run the master node as a Spot Instance if you are running the entire cluster (all instance groups) as Spot Instances.

Core Instance Group as Spot Instances

Core nodes process data and store information using HDFS. Because termination of core nodes can result in data loss and possible termination of the cluster, you would typically only run core nodes as Spot Instances if you are either not running task nodes or running task nodes as Spot Instances.

When you launch the core instance group as Spot Instances, Amazon EMR waits until it can provision all of the requested core instances before launching the instance group. This means that if you request a core instance group with six nodes, the instance group will not launch if there are only five nodes available at or below your bid price. In this case, Amazon EMR will continue to wait until all six core nodes are available at your Spot Price until it is successful or you terminate the cluster.

You can add Spot Instance core nodes either when you launch the cluster or later to add capacity to a running cluster. You cannot remove core nodes from a running cluster.

Task Instance Group as Spot Instances

The task nodes process data but do not hold persistent data in HDFS. If they terminate because the Spot Price has risen above your bid price, no data is lost and the effect on your cluster is minimal.

When you launch the task instance group as Spot Instances, Amazon EMR will provision as many task nodes as it can at your bid price. This means that if you request a task instance group with six nodes, and only five Spot Instances are available at your bid price, Amazon EMR will launch the instance group with five nodes, adding the sixth later if it can.

Launching the task instance group as Spot Instances is a strategic way to expand the capacity of your cluster while minimizing costs. If you launch your master and core instance groups as on-demand instances, their capacity is guaranteed for the run of the cluster and you can add task instances to the instance group as needed to handle peak traffic or to speed up data processing.

You can add and remove Spot Instance task nodes from a running cluster.