Configuring an Amazon EMR cluster to continue or terminate after step execution
This topic explains the differences between using a long-running cluster and creating a transient cluster that shuts down after the last step runs. It also covers how to configure step execution for a cluster.
Create a long-running cluster
By default, clusters that you create with the console or the AWS CLI are long-running. Long-running clusters continue to run, accept work, and accrue charges until you take action to shut them down.
A long-running cluster is effective in the following situations:
-
When you need to interactively or automatically query data.
-
When you need to interact with big data applications hosted on the cluster on an ongoing basis.
-
When you periodically process a data set so large or so frequently that it is inefficient to launch new clusters and load data each time.
You can also set termination protection on a long-running cluster to avoid shutting down EC2 instances by accident or error. For more information, see Using termination protection to protect your Amazon EMR clusters from accidental shut down.
Note
Amazon EMR automatically enables termination protection for all clusters with multiple primary nodes, and overrides any step execution settings that you supply when you create the cluster. You can disable termination protection after the cluster has been launched. See Configuring termination protection for running clusters. To shut down a cluster with multiple primary nodes, you must first modify the cluster attributes to disable termination protection. For instructions, see Terminate an Amazon EMR Cluster with multiple primary nodes.
Configure a cluster to terminate after step execution
When you configure termination after step execution, the cluster starts, runs bootstrap actions, and then runs the steps that you specify. As soon as the last step completes, Amazon EMR terminates the cluster's Amazon EC2 instances. Clusters that you launch with the Amazon EMR API have step execution enabled by default.
Termination after step execution is effective for clusters that perform a periodic processing task, such as a daily data processing run. Step execution also helps you ensure that you are billed only for the time required to process your data. For more information about steps, see Submit work to an Amazon EMR cluster.