Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Run a Script in a Cluster

Amazon Elastic MapReduce (Amazon EMR) enables you to run a script at any time during step processing in your cluster. You specify a step that runs a script either when you create your cluster or you can add a step if your cluster is in the WAITING state. For more information about adding steps, go to Add Steps to a Cluster. For more information about running an interactive cluster, go to Interactive and Batch Hive Clusters.

If you want to run a script before step processing begins, use a bootstrap action. For more information on bootstrap actions, go to Create Bootstrap Actions to Install Additional Software (Optional).

If you want to run a script immediately before cluster shutdown, use a shutdown action. For more information about shutdown actions, go to Shutdown Actions.

You can only run multi-step clusters from the CLI and the API. The Amazon EMR console does not support multiple steps.

CLI

This section describes how to add a step to run a script. The script-runner.jar takes arguments to the path to a script and any additional arguments for the script. The JAR file runs the script with the passed arguments. Script-runner.jar is located at s3://elasticmapreduce/libs/script-runner/script-runner.jar.

The cluster containing a step that runs a script looks similar to the following:

In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

  • Linux, UNIX, and Mac OS X users:

    ./elastic-mapreduce --create --alive --name "My Development Jobflow" \
    --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar \
    --args "s3://myawsbucket/script-path/my_script.sh"
  • Windows users:

    ruby elastic-mapreduce --create --alive --name "My Development Jobflow" --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar --args "s3://myawsbucket/script-path/my_script.sh"

This cluster runs the script my_script.sh on the master node when the step is processed.

API

This section describes the Amazon EMR API Query request needed to add a step to run a script. The response includes a <JobFlowID>.

The Amazon EMR JSON sample below contains a step that specifies the JAR s3://elasticmapreduce/libs/script-runner/script-runner.jar and passes the location and file name of the script.

[
{ "Name": "streaming cluster",
  "HadoopJarStep":
        {
        "Jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar",
        "Args":
           [
            "-input",   "s3n://elasticmapreduce/samples/wordcount/input",
            "-output",  "s3n://myawsbucket",
            "-mapper",  "s3://elasticmapreduce/samples/wordcount/wordSplitter.py",
            "-reducer", "aggregate"
           ]
        }},
{
"Name": "My Script Step",
"HadoopJarStep":
         {
         "Jar": "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar",
          "Args":
           [
            "s3://myawsbucket/script-path/my_script.sh"  
           ]
        }}
]

This cluster runs the script my_script.sh on the master node when the step is processed.