Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Add Steps to a Cluster

This section describes the methods for adding steps to a cluster.

You can add steps to a running cluster only if you set the KeepJobFlowAliveWhenNoSteps parameter to True when you create the cluster. This value keeps the Hadoop cluster engaged even after the completion of a cluster.

The Amazon EMR console does not support adding steps to a cluster.

The following procedure creates a simple cluster and then adds a step to the cluster.

To add a step to a cluster using the CLI

  1. Create a cluster:

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive 
    • Windows users:

      ruby elastic-mapreduce --create --alive 

    The --alive parameter keeps the cluster running even when all steps have been completed, unless you explicitly terminate it.

    The output looks similar to the following.

    Created cluster JobFlowID
  2. Add a step:

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce -j JobFlowID \
      --jar s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar \
      --arg s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br \
      --arg s3n://elasticmapreduce/samples/cloudburst/input/100k.br \
      --arg hdfs:///cloudburst/output/1 \
      --arg 36 --arg 3 --arg 0 --arg 1 --arg 240 --arg 48 --arg 24 --arg 24 --arg 128 --arg 16
    • Windows users:

      ruby elastic-mapreduce -j JobFlowID --jar s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar --arg s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br --arg s3n://elasticmapreduce/samples/cloudburst/input/100k.br --arg hdfs:///cloudburst/output/1 --arg 36 --arg 3 --arg 0 --arg 1 --arg 240 --arg 48 --arg 24 --arg 24 --arg 128 --arg 16

This command runs an example cluster step that downloads and runs the JAR file. The arguments are passed to the main function in the JAR file. If your JAR file does not have a manifest, specify the JAR file's main class using --main-class option.

Example using the API

The steps parameter defines the location and input parameters for the Hadoop JAR steps that perform the processing on the input data. Each step is identified by a member number.

Typically, you specify all cluster steps in a RunJobFlow request. The value of AddJobFlowSteps is that you can add steps to a cluster while it is already loaded onto the EC2 instances. You typically add steps to modify the data processing or to aid in debugging a cluster when you are working interactively with the cluster, that is, you are adding steps to the cluster while the cluster execution is paused.

The name parameter helps you distinguish step results, so it is best to make each name unique. Amazon EMR does not check for the uniqueness of step names.

The remainder of the steps parameter specifies the JAR file and the input parameters used to process the data.

When you debug a cluster, you must set the RunJobFlow parameter KeepJobAliveWhenNoSteps to True and ActionOnFailure to CANCEL_AND_WAIT.


Note

The maximum number of steps allowed in a cluster is 256. The debugging option uses additional steps to function, so it can exceed your step limit quickly. For more information about how to overcome this limitation, see Add More than 256 Steps to a Cluster.

To add steps to a cluster using the API

For a detailed example of adding steps to a cluster with the API using Java, see End-to-End Amazon EMR Java Source Code Sample.

  • Use the AddJobFlowSteps action to send a request similar to the following.

    https://elasticmapreduce.amazonaws.com?
    JobFlowId=JobFlowID&
    Operation=AddJobFlowSteps&
    Steps.member.1.Name=MyStep2&
    Steps.member.1.ActionOnFailure=CONTINUE&
    Steps.member.1.HadoopJarStep.Jar=s3://myawsbucket/MySecondJar&
    Steps.member.1.HadoopJarStep.MainClass=MainClass&
    Steps.member.1.HadoopJarStep.Args.member.1=arg1&
    AWSAccessKeyId=AccessKeyID&
    SignatureVersion=2&
    SignatureMethod=HmacSHA256&
    Timestamp=2009-01-28T21%3A51%3A51.000Z&
    Signature=calculated value

    For more information about the parameters unique to AddJobFlowSteps, see AddJobFlowSteps. For more information about the generic parameters in the request, see Common Request Parameters.

The response contains the request ID.