Life Cycle of a Cluster
The following diagram shows the life cycle of a cluster and how each stage maps to a particular cluster state.
A successful Amazon Elastic MapReduce (Amazon EMR) cluster follows this process: Amazon EMR first
provisions a Hadoop cluster. During this phase, the cluster state is
STARTING. Next, any user-defined bootstrap actions are run.
During this phase, the cluster state is
Once the cluster reaches this phase, you are being billed for the EC2 instances provisioned.
After all bootstrap actions are completed, the cluster state is
RUNNING. The job flow sequentially runs all cluster steps
during this phase.
If you configured your cluster as a long-running cluster by enabling keep
alive, the cluster will go into a
WAITING state after processing is
done and wait for the next set of instructions. For more information, see How to Send Work to a Cluster and Choose the Cluster Lifecycle: Long-Running or
Transient. You will have to manually
terminate the cluster when you no longer require it.
If you configured your cluster as a transient cluster, it will automatically shut down after all of the steps complete.
When a cluster terminates without encountering an error, the state transitions
SHUTTING_DOWN and the cluster shuts down, terminating the
virtual server instances. All data stored on the cluster is deleted. Information
stored elsewhere, such as in your Amazon S3 bucket, persists. Finally, when all
cluster activity is complete, the cluster state is marked as
Unless termination protection is enabled, any failure during the cluster
process terminates the cluster and all its virtual server instances. Any data
stored on the cluster is deleted. The cluster state is marked as
FAILED. For more information, see Managing Cluster Termination.
For a complete list of cluster states, see the JobFlowExecutionStatusDetail data type in the Amazon Elastic MapReduce (Amazon EMR) API Reference.