Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Protect a Cluster from Termination

Termination protection ensures that the EC2 instances in your job flow are not shut down by an accident or error. This protection is especially useful if your cluster contains data in instance storage that you need to recover before those instances are terminated.

By default, termination protection is disabled on clusters. When termination protection is not enabled, you can terminate clusters either through calls to the TerminateJobFlows API, through the Amazon EMR console, or by using the command line interface. In addition, the master node may terminate a task node that has become unresponsive or has returned an error.

When termination protection is enabled, you must explicitly remove termination protection from the cluster before you can terminate the cluster. With termination protection enabled, TerminateJobFlows can't terminate the cluster and users can't terminate the cluster using the CLI. Users terminating the cluster using the Amazon EMR console receive an extra confirmation box asking if they want to remove termination protection before terminating the cluster.

If you attempt to terminate a protected cluster with the API or CLI, the API returns an error, and the CLI exits with a non-zero return code.

The ActionOnFailure setting determines what the cluster does in response to any errors. The possible values for this setting are:

  • TERMINATE_JOB_FLOW: If the step fails, terminate the job flow. If the job flow has termination protection enabled AND keep alive enabled, it will not terminate.

  • CANCEL_AND_WAIT: If the step fails, cancel the remaining steps. If the cluster has keep alive enabled, the cluster will not terminate.

  • CONTINUE: If the step fails, continue to the next step.

Note

Use cluster termination protection judiciously because it can lead to additional charges for the persistent EC2 instances.

Termination Protection in Amazon EMR and Amazon EC2

Termination protection of clusters in Amazon EMR is analogous to setting the disableAPITermination flag on an EC2 instance. In the event of a conflict between the termination protection set in Amazon EC2 and that set in Amazon EMR, the Amazon EMR cluster protection status overrides that set by Amazon EC2 on the given instance. For example, if you use the Amazon EC2 console to enable termination protection on an EC2 instance in an Amazon EMR cluster that has termination protection disabled, Amazon EMR turns off termination protection on that EC2 instance and shuts down the instance when the rest of the cluster terminates.

Termination Protection and Spot Instances

Amazon EMR termination protection does not prevent an Amazon EC2 Spot Instance from terminating when the Spot Price rises above the maximum bid price. For more information about the behavior of Spot Instances in Amazon EMR, see Lower Costs with Spot Instances (Optional).

Termination Protection and Auto-terminate

Enabling termination protection on a cluster is similar to disabling the auto-terminate option in the console (or using the --alive argument in the CLI), but the protections offered are different. Disabling auto-terminate causes instances in a cluster to persist after steps have successfully completed, but still allows the cluster to be terminated by errors and by calls to TerminateJobFlows. Enabling termination protection also causes the cluster to persist after steps are successfully completed, but does not allow cluster termination by user action, errors, or TerminateJobFlow calls.

The following table compares the protections offered by termination protection and auto-terminate.

Termination causeTermination protection enabledAuto-terminate disabled
Successful completion 
User actions
 
TerminateJobFlows
 
Errors
 

Note

By default, auto-terminate is disabled for clusters launched using the console and the CLI. Clusters launched using the API have auto-terminate enabled.

Protecting a New Cluster

You can specify that a new cluster be protected from termination during the cluster creation.

Launch a cluster with termination protection using the console

  1. Open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.

  2. Click Create cluster.

  3. In the Cluster Configuration section, set the Termination protection field to Yes. This is the default setting.

  4. Continue through the configuration sections, following the directions for the type of cluster you are launching. For more information, see Plan an Amazon EMR Cluster.

Launch a cluster with termination protection using the CLI

  • Specify --with-termination-protection during the cluster creation call. The following example shows setting termination protection on the WordCount sample application.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    Note

    The Hadoop streaming syntax is different between Hadoop 1.x and Hadoop 2.x.

    For Hadoop 2.x, use the following command:

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --ami-version 3.0.3 \
      							--instance-type m1.xlarge --num-instances 2 \
      							--stream --arg "-files" --arg "s3://elasticmapreduce/samples/wordcount/wordSplitter.py" \
      							--input s3://elasticmapreduce/samples/wordcount/input \
      							--output s3://myawsbucket/output/2014-01-16 --mapper wordSplitter.py --reducer aggregate \
      							--with-termination-protection
    • Windows users:

      ruby elastic-mapreduce --create --alive --ami-version 3.0.3 --instance-type m1.xlarge --num-instances 2 --stream --arg "-files" --arg "s3://elasticmapreduce/samples/wordcount/wordSplitter.py" --input s3://elasticmapreduce/samples/wordcount/input --output s3://myawsbucket/output/2014-01-16 --mapper wordSplitter.py --reducer aggregate --with-termination-protection

    For Hadoop 1.x, use the following command:

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive /
      --instance-type m1.xlarge --num-instances 2 --stream /
      --input s3://elasticmapreduce/samples/wordcount/input /
      --output s3://myawsbucket/wordcount/output/2011-03-25 /
      --mapper s3://elasticmapreduce/samples/wordcount/wordSplitter.py --reducer aggregate /
      --with-termination-protection
    • Windows users:

      ruby elastic-mapreduce --create --alive --instance-type m1.xlarge --num-instances 2 --stream --input s3://elasticmapreduce/samples/wordcount/input --output s3://myawsbucket/wordcount/output/2011-03-25 --mapper s3://elasticmapreduce/samples/wordcount/wordSplitter.py --reducer aggregate --with-termination-protection

    For more information about launching clusters using the CLI, see Plan an Amazon EMR Cluster.

Protecting an Existing Cluster

You can add termination protection to an running cluster using the console, the CLI, or the API.

To enable termination protection for an existing cluster using the console

  1. Open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.

  2. On the Cluster List page, click the link for your cluster.

  3. On the Cluster Details page, in the Summary section, for Termination protection, click Change.

  4. Click On and then click the check mark icon to enable termination protection.

    Confirm termination protection change

To enable termination protection for an existing cluster using the CLI

  • Set the --set-termination-protection flag to true. This is shown in the following example, where JobFlowID is the identifier of the cluster on which to enable termination protection.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --set-termination-protection true --jobflow JobFlowID
    • Windows users:

      ruby elastic-mapreduce --set-termination-protection true --jobflow JobFlowID

\

Terminating a Protected Cluster

To terminate a protected cluster, you must first disable termination protection. After termination protection is disabled, you can terminate the cluster from the Amazon EMR console, CLI, or programmatically using the TerminateJobFlows API.

To terminate a cluster with termination protection set using the Amazon EMR console

  1. Open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.

  2. Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.

  3. On the Cluster Details page, click Terminate.

  4. In the Terminate clusters dialog, for Termination protection, click Change.

  5. Click Off and then click the check mark icon to disable termination protection.

To terminate a cluster with termination protection set using the CLI

  1. Disable termination protection by setting the --set-termination-protection parameter to false. This is shown in the following example, where JobFlowID is the identifier of the cluster on which to disable termination protection.

    elastic-mapreduce --set-termination-protection false --jobflow JobFlowID
  2. Terminate the cluster using the --terminate parameter and specifying the cluster identifier of the cluster to terminate.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --terminate JobFlowID
    • Windows users:

      ruby elastic-mapreduce --terminate JobFlowID