When Amazon EC2 has unused capacity, it offers Amazon EC2 instances at a reduced cost, called the Spot Price. This price fluctuates based on availability and demand. You can purchase Spot Instances by placing a request that includes the highest bid price you are willing to pay for those instances. When the Spot Price is below your bid price, your Spot Instances are launched and you are billed the Spot Price. If the Spot Price rises above your bid price, Amazon EC2 terminates your Spot Instances.
For more information about Spot Instances, see Using Spot Instances in the Amazon EC2 User Guide for Linux Instances.
The following video describes how Spot Instances work in Amazon EMR and walks you through the process of launching a cluster on Spot Instances from the Amazon EMR console.
Additional video instruction includes:
Amazon EC2 - Deciding on your Spot Bidding Strategy, describes strategies to use when setting a bid price for Spot Instances.
Amazon EC2 - Managing Interruptions for Spot Instance Workloads, describes ways to handle Spot Instance termination.
If your workload is flexible in terms of time of completion or required capacity, Spot Instances can significantly reduce the cost of running your clusters. Workloads that are ideal for using Spot Instances include: application testing, time-insensitive workloads, and long-running clusters with fluctuations in load.
We do not recommend Spot Instances for master and core nodes unless the cluster is expected to be short-lived and the workload is non-critical. Also, Spot Instances are not recommended for clusters that are time-critical or that need guaranteed capacity. These clusters should be launched using on-demand instances.
There are several scenarios in which Spot Instances are useful for running an Amazon EMR cluster.
If you are running a persistent Amazon EMR cluster, such as a data warehouse, that has a predictable variation in computational capacity, you can handle peak demand at lower cost with Spot Instances. Launch your master and core instance groups as on-demand to handle the normal capacity and launch the task instance group as Spot Instances to handle your peak load requirements.
If you are running transient clusters for which lower cost is more important than the time to completion, and losing partial work is acceptable, you can run the entire cluster (master, core, and task instance groups) as Spot Instances to benefit from the largest cost savings.
If you are running a cluster for which lower cost is more important than time to completion, but losing partial work is not acceptable, launch the master and core instance groups as on-demand and supplement with a task instance group of Spot Instances. Running the master and core instance groups as on-demand ensures that your data is persisted in HDFS and that the cluster is protected from termination due to Spot market fluctuations, while providing cost savings that accrue from running the task instance group as Spot Instances.
When you are testing a new application in order to prepare it for launch in a production environment, you can run the entire cluster (master, core, and task instance groups) as Spot Instances to reduce your testing costs.