Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Lower Costs with Spot Instances (Optional)

When Amazon EC2 has unused capacity, it offers Amazon EC2 instances at a reduced cost, called the Spot Price. This price fluctuates based on availability and demand. You can purchase Spot Instances by placing a request that includes the highest bid price you are willing to pay for those instances. When the Spot Price is below your bid price, your Spot Instances are launched and you are billed the Spot Price. If the Spot Price rises above your bid price, Amazon EC2 terminates your Spot Instances.

For more information about Spot Instances, see Using Spot Instances in the Amazon EC2 User Guide for Linux Instances.

The following video describes how Spot Instances work in Amazon EMR and walks you through the process of launching a cluster on Spot Instances from the Amazon EMR console.

Additional video instruction includes:

If your workload is flexible in terms of time of completion or required capacity, Spot Instances can significantly reduce the cost of running your clusters. Workloads that are ideal for using Spot Instances include: application testing, time-insensitive workloads, and long-running clusters with fluctuations in load.

Note

We do not recommend Spot Instances for master and core nodes unless the cluster is expected to be short-lived and the workload is non-critical. Also, Spot Instances are not recommended for clusters that are time-critical or that need guaranteed capacity. These clusters should be launched using on-demand instances.

When Should You Use Spot Instances?

There are several scenarios in which Spot Instances are useful for running an Amazon EMR cluster.

Long-Running Clusters and Data Warehouses

If you are running a persistent Amazon EMR cluster, such as a data warehouse, that has a predictable variation in computational capacity, you can handle peak demand at lower cost with Spot Instances. Launch your master and core instance groups as on-demand to handle the normal capacity and launch the task instance group as Spot Instances to handle your peak load requirements.

Cost-Driven Workloads

If you are running transient clusters for which lower cost is more important than the time to completion, and losing partial work is acceptable, you can run the entire cluster (master, core, and task instance groups) as Spot Instances to benefit from the largest cost savings.

Data-Critical Workloads

If you are running a cluster for which lower cost is more important than time to completion, but losing partial work is not acceptable, launch the master and core instance groups as on-demand and supplement with a task instance group of Spot Instances. Running the master and core instance groups as on-demand ensures that your data is persisted in HDFS and that the cluster is protected from termination due to Spot market fluctuations, while providing cost savings that accrue from running the task instance group as Spot Instances.

Application Testing

When you are testing a new application in order to prepare it for launch in a production environment, you can run the entire cluster (master, core, and task instance groups) as Spot Instances to reduce your testing costs.