| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
Amazon Elastic MapReduce (Amazon EMR) enables you to run a script at any time during step processing in your
cluster. You specify a step that runs a script either when you create your cluster or you
can add a step if your cluster is in the WAITING state. For more information
about adding steps, go to Add Steps to a Cluster. For more information about running an interactive
cluster, go to Interactive and Batch Hive Clusters.
If you want to run a script before step processing begins, use a bootstrap action. For more information on bootstrap actions, go to Create Bootstrap Actions to Install Additional Software (Optional).
If you want to run a script immediately before cluster shutdown, use a shutdown action. For more information about shutdown actions, go to Shutdown Actions.
You can only run multi-step clusters from the CLI and the API. The Amazon EMR console does not support multiple steps.
This section describes how to add a step to run a script. The
script-runner.jar takes arguments to the path to a script and
any additional arguments for the script. The JAR file runs the script with the passed
arguments. Script-runner.jar is located at
s3://elasticmapreduce/libs/script-runner/script-runner.jar.
The cluster containing a step that runs a script looks similar to the following:
In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --create --alive --name "My Development Jobflow" \ --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar \ --args "s3://myawsbucket/script-path/my_script.sh"
Windows users:
ruby elastic-mapreduce --create --alive --name "My Development Jobflow" --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar --args "s3://myawsbucket/script-path/my_script.sh"
This cluster runs the script my_script.sh on the master node when the step is
processed.
This section describes the Amazon EMR API Query request needed to add a step
to run a script. The response includes a
<JobFlowID>.
The Amazon EMR JSON sample below contains a step that specifies the JAR
s3://elasticmapreduce/libs/script-runner/script-runner.jar and passes
the location and file name of the script.
[
{ "Name": "streaming cluster",
"HadoopJarStep":
{
"Jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar",
"Args":
[
"-input", "s3n://elasticmapreduce/samples/wordcount/input",
"-output", "s3n://myawsbucket",
"-mapper", "s3://elasticmapreduce/samples/wordcount/wordSplitter.py",
"-reducer", "aggregate"
]
}},
{
"Name": "My Script Step",
"HadoopJarStep":
{
"Jar": "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar",
"Args":
[
"s3://myawsbucket/script-path/my_script.sh"
]
}}
]This cluster runs the script my_script.sh on the master node when the step is
processed.