| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
This section describes the methods for adding steps to a cluster.
You can add steps to a running cluster only if you set the
KeepJobFlowAliveWhenNoSteps parameter to True when you
create the cluster. This value keeps the Hadoop cluster engaged even after the completion of
a cluster.
The Amazon EMR console does not support adding steps to a cluster.
The following procedure creates a simple cluster and then adds a step to the cluster.
To add a step to a cluster using the CLI
Create a cluster:
In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --alive --stream
Windows users:
ruby elastic-mapreduce --create --alive --stream
The --stream parameter adds a streaming step using default
parameters. The default parameters are the word count example that is available in the
Amazon EMR console. The --alive keeps the cluster running even when all steps have been completed, unless you explicitly terminate it.
The output looks similar to the following.
Created cluster JobFlowIDAdd a step:
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce -j JobFlowID \
--jar s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar \
--main-class org.myorg.WordCount \
--arg s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br \
--arg s3n://elasticmapreduce/samples/cloudburst/input/100k.br \
--arg hdfs:///cloudburst/output/1 \
--arg 36 --arg 3 --arg 0 --arg 1 --arg 240 --arg 48 --arg 24 \
--arg 24 --arg 128 --arg 16Windows users:
ruby elastic-mapreduce -j JobFlowID --jar s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar --main-class org.myorg.WordCount --arg s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br --arg s3n://elasticmapreduce/samples/cloudburst/input/100k.br --arg hdfs:///cloudburst/output/1 --arg 36 --arg 3 --arg 0 --arg 1 --arg 240 --arg 48 --arg 24 --arg 24 --arg 128 --arg 16This command runs an example cluster step that downloads and runs the JAR file. The arguments are passed to the main function in the JAR file.
If your JAR file has a manifest, you do not need to specify the JAR file's main class using
--main-class, as shown in the preceding example.
Example using the API
The steps parameter defines the location and input parameters for
the Hadoop JAR steps that perform the processing on the input data. Each step is identified
by a member number.
Typically, you specify all cluster steps in a RunJobFlow request.
The value of AddJobFlowSteps is that you can add steps to a cluster
while it is already loaded onto the EC2 instances. You typically add steps to modify
the data processing or to aid in debugging a cluster when you are working interactively
with the cluster, that is, you are adding steps to the cluster while the cluster
execution is paused.
The name parameter helps you distinguish step results, so it is
best to make each name unique. Amazon EMR does not check for the uniqueness of step
names.
The remainder of the steps parameter specifies the JAR file and
the input parameters used to process the data.
When you debug a cluster, you must set the RunJobFlow parameter
KeepJobAliveWhenNoSteps to True and
ActionOnFailure to CANCEL_AND_WAIT.
Note
The maximum number of steps allowed in a cluster is 256. For more information about how to overcome this limitation, see Add More than 256 Steps to a Cluster.
To add steps to a cluster using the API
Use the AddJobFlowSteps action to send a request similar to the following.
https://elasticmapreduce.amazonaws.com? JobFlowId=JobFlowID& Operation=AddJobFlowSteps& Steps.member.1.Name=MyStep2& Steps.member.1.ActionOnFailure=CONTINUE& Steps.member.1.HadoopJarStep.Jar=s3://myawsbucket/MySecondJar& Steps.member.1.HadoopJarStep.MainClass=MainClass& Steps.member.1.HadoopJarStep.Args.member.1=arg1& AWSAccessKeyId=AccessKeyID& SignatureVersion=2& SignatureMethod=HmacSHA256& Timestamp=2009-01-28T21%3A51%3A51.000Z& Signature=calculated value
For more information about the parameters unique to
AddJobFlowSteps, see AddJobFlowSteps. For more information about the generic parameters in the
request, see Common Request Parameters.
The response contains the request ID.