Amazon EMR
Management Guide

Configure Cluster Scale-Down

With Amazon EMR versions 5.1.0 and later, you can configure the scale-down behavior of Amazon EC2 instances when a termination request is issued. A termination request can come from an automatic scaling policy that triggers a scale-in activity, or from the manual removal of instances from an instance group when a cluster is resized. There are two options for scale-down behavior: terminate at the instance-hour boundary for Amazon EC2 billing, or terminate at task completion. You can use the AWS Management Console for Amazon EMR, the AWS CLI, or the Amazon EMR API. Terminating at the instance-hour boundary is the default, regardless of when the request to terminate the instance was submitted. Because Amazon EC2 charges per full hour regardless of when the instance is terminated, this behavior enables applications running on your cluster to more cost-effectively utilize Amazon EC2 instances in a dynamically scaling environment. Amazon EMR terminates nodes with the least tasks or no tasks first.

For clusters created using Amazon EMR versions earlier than 5.1.0, and beginning with version 4.1.0, Amazon EMR terminates Amazon EC2 instances at task completion by default. Specifying termination at the instance-hour boundary is not available in these versions. When terminate at task completion is specified, Amazon EMR blacklists and drains tasks from nodes before terminating the Amazon EC2 instances, regardless of the instance-hour boundary.

With either behavior Amazon EMR does not terminate EC2 instances in core instance groups if it could lead to HDFS corruption.

Configuring Amazon EMR Scale-Down Behavior


This configuration feature is only available for Amazon EMR releases 5.1.0 or later.

You can use the AWS Management Console, the AWS CLI, or the Amazon EMR API to configure scale-down behavior when you create a cluster. Configuring scale-down using the AWS Management Console is done in the Step 3: General Cluster Settings screen when you create a cluster using Advanced options.

						Scale-down configuration for Amazon EMR instance termination.

When you create a cluster using the AWS CLI, use the --ScaleDownBehavior option to specify either TERMINATE_AT_INSTANCE_HOUR or TERMINATE_AT_TASK_COMPLETION.

Terminate at Task Completion

Amazon EMR allows you to scale down your cluster without affecting your workload. Amazon EMR gracefully decommissions YARN, HDFS, and other daemons on core and task nodes during a resize down operation without losing data or interrupting jobs. Amazon EMR only shrinks instance groups if the work assigned to the groups has completed and they are idle. For YARN NodeManager decommissioning, you can manually adjust the time a node waits for decommissioning by setting yarn.resourcemanager.decommissioning.timeout inside /etc/hadoop/conf/yarn-site.xml; this setting is dynamically propagated. If there are still running containers or YARN applications when the decommissioning timeout passes, the node is forced to be decommissioned and YARN reschedules affected containers on other nodes. The default value is 3600s (one hour), meaning a YARN node under that has been issued a resize down request will become decommissioned within one hour or less. You can set this timeout to be an arbitrarily high value to force graceful shrink to wait longer.

Task Node Groups

Amazon EMR will intelligently select instances which are not running tasks and remove them from a cluster first. If all instances in the cluster are being utilized, Amazon EMR will wait for tasks to complete on a given instance before removing it from the cluster. The default wait time is 1 hour and this value can be changed by setting yarn.resourcemanager.decommissioning.timeout. Amazon EMR will dynamically use the new setting. You can set this to an arbitrarily large number to ensure that no tasks are killed while shrinking the cluster.

Core Node Groups

On core nodes, both YARN NodeManager and HDFS DataNode daemons must be decommissioned in order for the instance group to shrink. For YARN, graceful shrink ensures that a node marked for decommissioning is only transitioned to the DECOMMISIONED state if there are no pending or incomplete containers or applications. The decommissioning finishes immediately if there are no running containers on the node at the beginning of decommissioning.

For HDFS, graceful shrink ensures that the target capacity of HDFS is large enough to fit all existing blocks. If the target capacity is not large enough, only a partial amount of core instances are decommissioned such that the remaining nodes can handle the current data residing in HDFS. You should ensure additional HDFS capacity to allow further decommissioning. You should also try to minimize write I/O before attempting to shrink instance groups as that may delay the completion of the resize operation.

Another limit is the default replication factor, dfs.replication inside /etc/hadoop/conf/hdfs-site. Amazon EMR configures the value based on the number of instances in the cluster: 1 with 1-3 instances, 2 for clusters with 4-9 instances, and 3 for clusters with 10+ instances. Graceful shrink does not allow you to shrink core nodes below the HDFS replication factor; this is to prevent HDFS from being unable to close files due insufficient replicas. To circumvent this limit, you must lower the replication factor and restart the NameNode daemon.