Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Resize a Running Cluster

You can increase or decrease the number of nodes in a running cluster. A cluster contains a single master node. The master node controls any slave nodes that are present. There are two types of slave nodes: core nodes, which hold data to process in the Hadoop Distributed File System (HDFS), and task nodes, which do not contain HDFS. After a cluster is running, you can increase, but not decrease, the number of core nodes. Task nodes also run your Hadoop jobs. After a cluster is running, you can both increase or decrease the number of task nodes.

You can modify the size of a running cluster using either the console, CLI, or API.

Nodes within a cluster are managed by instance groups. All clusters require a master instance group containing a single master node. Clusters using slave nodes require a core instance group that contains at least one core node. Additionally, if a cluster has a core instance group, it can also have a task instance group containing one or more task nodes.

Note

You must have at least one core node at cluster creation in order to resize. In other words, single node clusters cannot be resized.

When your cluster runs, Hadoop determines the number of mapper and reducer tasks needed to process the data. Larger clusters should have more tasks for better resource use and shorter processing time. Typically, an Amazon EMR cluster remains the same size during the entire cluster; you set the number of tasks when you create the cluster. When you resize a running cluster, you can vary the processing during the cluster execution. Therefore, instead of using a fixed number of tasks, you can vary the number of tasks during the life of the cluster. There are two configuration options to help set the ideal number of tasks.

  • mapred.map.tasksperslot

  • mapred.reduce.tasksperslot

You can set both options in the mapred-conf.xml file. When you submit a job flow to the cluster, the job client checks the current total number of map and reduce slots available cluster wide. The job client then uses the following equations to set the number of tasks:

  • mapred.map.tasks = mapred.map.tasksperslot * map slots in cluster

  • mapred.reduce.tasks = mapred.reduce.tasksperslot * reduce slots in cluster

The job client only reads the tasksperslot parameter if the number of tasks is not configured. You can override the number of tasks at any time, either for all clusters via a bootstrap action or individually per job by adding a step to change the configuration.

Amazon EMR withstands slave node failures and continues cluster execution even if a slave node becomes unavailable. Amazon EMR automatically provisions additional slave nodes to replace those that fail.

You can have a different number of slave nodes for each cluster step. You can also add a step to a running cluster to modify the number of its slave nodes. Because all steps are guaranteed to run sequentially, you can specify the number of running slave nodes for any job flow step.

Resize a Cluster Using the Console

You can use the Amazon EMR console to resize a cluster while it is running.

To resize a running cluster using the console

  1. From the Cluster List page, click a cluster to resize.

  2. On the Cluster Details page, click Resize. Alternatively, you can expand the Hardware Configuration section, click the Resize button adjacent to the core or task nodes, and increase or decrease the number of instances for the instance group.

  3. To add task nodes to a cluster that has none, click Add Task Nodes to choose the task node type, the number of task nodes, and whether the task nodes are spot instances.

Note

You can only increase the number of core nodes but you can both increase and decrease the number of task nodes.

When you make a change to the number of nodes, the Amazon EMR console updates the status of the instance group through the Provisioning and Resizing states until they are ready and indicate in brackets the newly requested number of nodes. When the change to the node count finishes, the instance groups return to the Running state.

Parameters for Resizing Clusters

The Amazon EMR CLI provides parameters so you can control how you resize a running cluster.

Parameters to Increase or Decrease Nodes

You can increase or decrease the number of nodes in a running cluster. The parameters are listed in the following table.

ParameterDescription
--modify-instance-group INSTANCE_GROUP_IDModify an existing instance group.
--instance-count INSTANCE_COUNT

Set the count of nodes for an instance group.

Note

You are only allowed to increase the number of nodes in a core instance group.

You can increase or decrease the number of nodes in a task instance group.

Master instance groups can not be modified.

Parameters to Add an Instance Group to a Running Job Flow

You can add an instance group to your running cluster. The parameters are listed in the following table.

ParameterDescription
--add-instance-group ROLEAdd an instance group to an existing cluster. The role may be TASK only. Currently, Amazon Elastic MapReduce (Amazon EMR) does not permit adding core or master instance groups to a running cluster.
--instance-count INSTANCE_COUNTSet the count of nodes for an instance group.
--instance-type INSTANCE_TYPESet the type of EC2 instance to create nodes for an instance group.

Parameters to Specify an Instance Group when Creating a Cluster

You can specify instance groups when you create a cluster. The parameters are listed in the following table.

ParameterDescription
--instance-group TYPESet the instance group type. A type is MASTER, CORE, or TASK
--instance-count INSTANCE_COUNTSet the count of nodes for an instance group.
--instance-type INSTANCE_TYPESet the type of EC2 instance for nodes in an instance group.

The --describe command describes all instance groups and node types. If you run elastic-mapreduce --jobflow JobFlowID --describe, you see a section called InstanceGroups. You can see that your cluster contains a master instance group and, potentially, core and task instance groups.

The following CLI commands show how to display information about the instance groups of a running cluster. In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

  • Linux, UNIX, and Mac OS X users:

    ./elastic-mapreduce --jobflow JobFlowID --describe
  • Windows users:

    ruby elastic-mapreduce --jobflow JobFlowID --describe

This returns information about the cluster similar to the following:

{
  "JobFlows": [
    {
      "Name": "Development Job Flow (requires manual termination)",
      "LogUri": "s3n:\/\/myawsbucket\/FileName\/",
      "ExecutionStatusDetail": {
        "StartDateTime": null,
        "EndDateTime": null,
        "LastStateChangeReason": "Starting instances",
        "CreationDateTime": DateTimeStamp,
        "State": "STARTING",
        "ReadyDateTime": null
      },
      "Steps": [],
      "Instances": {
        "MasterInstanceId": null,
        "Ec2KeyName": "KeyName",
        "NormalizedInstanceHours": 0,
        "InstanceCount": 5,
        "Placement": {
          "AvailabilityZone": "us-east-1a"
        },
        "SlaveInstanceType": "m1.small",
        "HadoopVersion": "0.20",
        "MasterPublicDnsName": null,
        "KeepJobFlowAliveWhenNoSteps": true,
        "InstanceGroups": [
          {
            "StartDateTime": null,
            "SpotPrice": null,
            "Name": "Master Instance Group",
            "InstanceRole": "MASTER",
            "EndDateTime": null,
            "LastStateChangeReason": "",
            "CreationDateTime": DateTimeStamp,
            "LaunchGroup": null,
            "InstanceGroupId": "InstanceGroupID",
            "State": "PROVISIONING",
            "Market": "ON_DEMAND",
            "ReadyDateTime": null,
            "InstanceType": "m1.small",
            "InstanceRunningCount": 0,
            "InstanceRequestCount": 1
          },
          {
            "StartDateTime": null,
            "SpotPrice": null,
            "Name": "Task Instance Group",
            "InstanceRole": "TASK",
            "EndDateTime": null,
            "LastStateChangeReason": "",
            "CreationDateTime": DateTimeStamp,
            "LaunchGroup": null,
            "InstanceGroupId": "InstanceGroupID",
            "State": "PROVISIONING",
            "Market": "ON_DEMAND",
            "ReadyDateTime": null,
            "InstanceType": "m1.small",
            "InstanceRunningCount": 0,
            "InstanceRequestCount": 2
          },
          {
            "StartDateTime": null,
            "SpotPrice": null,
            "Name": "Core Instance Group",
            "InstanceRole": "CORE",
            "EndDateTime": null,
            "LastStateChangeReason": "",
            "CreationDateTime": DateTimeStamp,
            "LaunchGroup": null,
            "InstanceGroupId": "InstanceGroupID",
            "State": "PROVISIONING",
            "Market": "ON_DEMAND",
            "ReadyDateTime": null,
            "InstanceType": "m1.small",
            "InstanceRunningCount": 0,
            "InstanceRequestCount": 2
          }
        ],
        "MasterInstanceType": "m1.small"
      },
      "bootstrapActions": [],
      "JobFlowId": "JobFlowID"
    }
  ]
}

Arrested State

An instance group goes into arrested state if it encounters too many errors while trying to start the new cluster nodes. For example, if new nodes fail while performing bootstrap actions, the instance group goes into an arrested state, rather than continuously provision new nodes. After you resolve the underlying issue, reset the desired number of nodes on the cluster's instance group, and then the instance group resumes allocating nodes.

The command --describe returns all instance groups and node types, and so you can see the state of the instance groups for the cluster. If Amazon EMR detects any kind of fault with an instance group, it changes the group's state to ARRESTED.

Use the --modify-instance-group command to reset a cluster in the ARRESTED state.

Modifying the instance group instructs Amazon EMR to attempt to provision nodes again. No running nodes are restarted or terminated.

To reset a cluster in an arrested state

  • Enter the --modify-instance-group command as follows:

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --modify-instance-group InstanceGroupID \
      -–instance-count COUNT
    • Windows users:

      ruby elastic-mapreduce --modify-instance-group InstanceGroupID -–instance-count COUNT

    The <InstanceGroupID>/<InstanceGroupID> is the ID of the arrested instance group and <COUNT> is the number of nodes you want in the instance group.

Tip

You do not need to change the number of nodes from the original configuration to free a running cluster. Set -–instance-count to the same count as the original setting.

Legacy Clusters

Before October 2010, Amazon EMR did not have the concept of instance groups. Clusters developed for Amazon EMR that were built before the option to resize running clusters was available are considered legacy clusters. Previously, the Amazon EMR architecture did not use instance groups to manage nodes and only one type of slave node existed. Legacy clusters reference slaveInstanceType and other now deprecated fields. Amazon EMR continues to support the legacy clusters; you do not need to modify them to run them correctly.

Cluster Behavior

If you run a legacy cluster and only configure master and slave nodes, you observe a slaveInstanceType and other deprecated fields associated with your clusters.

Mapping Legacy Clusters to Instance Groups

Before October 2010, all cluster nodes were either master nodes or slave nodes. An Amazon EMR configuration could typically be represented like the following diagram.

Old Amazon EMR Model

1 A legacy cluster launches and a request is sent to Amazon EMR to start the cluster.
2 Amazon EMR creates a Hadoop cluster.
3 The legacy cluster runs on a cluster consisting of a single master node and the specified number of slave nodes.

Clusters created using the older model are fully supported and function as originally designed. The Amazon EMR API and commands map directly to the new model. Master nodes remain master nodes and become part of the master instance group. Slave nodes still run HDFS and become core nodes and join the core instance group.

Note

No task instance group or task nodes are created as part of a legacy cluster, however you can add them to a running cluster at any time.

The following diagram illustrates how a legacy cluster now maps to master and core instance groups.

Old Amazon EMR Model Remapped to Current Architecture

1 A request is sent to Amazon EMR to start a cluster.
2 Amazon EMR creates an Hadoop cluster with a master instance group and core instance group.
3 The master node is added to the master instance group.
4 The slave nodes are added to the core instance group.

Library Files

Amazon EMR provides a library file containing a JAR file to create a cluster step programmatically instead of directly through the CLI.

The JAR file to programmatically resize a running cluster is available at s3://elasticmapreduce/libs/resize-job-flow/0.1/resize-job-flow.jar and supports the optional arguments described in the following table.

OptionDescription

--help

List all help information.
--modify-instance-group ROLE/InstanceGroupIDApply changes to the named instance group, specified by either role or Instance Group ID. Instance group roles: MASTER, CORE, or TASK.
--set-instance-count <COUNT>Change the number of nodes of the named instance group.
--add-instance-group <ROLE>Apply operations to the named instance group. Instance group roles: TASK. Currently, Amazon EMR does not permit adding core or master instance groups to a running cluster.
--instance-count <COUNT>Specify the number of nodes for the named instance group.
--instance-type <TYPE>Specify the type of EC2 instances used to create nodes in the new instance group.
--no-waitThe cluster continues in the RUNNING state after the step makes a request to create or resize an instance group.
--on-failure STATEStep state if one of the resizing actions fails: FAIL or CONTINUE.
--on-arrested <STATE>Cluster state if an instance group enters the ARRESTED state: FAIL, WAIT, or CONTINUE.

The JAR file is configured to write to stderr. Only error and fatal messages are reported. The JAR file includes the source code.

The cluster step looks similar to:

s3://elasticmapreduce/libs/resize-job-flow/0.1/resize-job-flow.jar \
--add-instance-group task --instance-type InstanceType --instance-count 10
        

For more information about how to add a cluster step, see Submit Work to a Cluster.