Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Run a Script in a Cluster

Amazon Elastic MapReduce (Amazon EMR) enables you to run a script at any time during step processing in your cluster. You specify a step that runs a script either when you create your cluster or you can add a step if your cluster is in the WAITING state. For more information about adding steps, go to Submit Work to a Cluster. For more information about running an interactive cluster, go to Interactive and Batch Hive Clusters.

If you want to run a script before step processing begins, use a bootstrap action. For more information on bootstrap actions, go to Create Bootstrap Actions to Install Additional Software (Optional).

If you want to run a script immediately before cluster shutdown, use a shutdown action. For more information about shutdown actions, go to Shutdown Actions.

CLI

This section describes how to add a step to run a script. The script-runner.jar takes arguments to the path to a script and any additional arguments for the script. The JAR file runs the script with the passed arguments. Script-runner.jar is located at s3://elasticmapreduce/libs/script-runner/script-runner.jar.

The cluster containing a step that runs a script looks similar to the following:

In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

  • Linux, UNIX, and Mac OS X users:

    ./elastic-mapreduce --create --alive --name "My Development Jobflow" \
    --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar \
    --args "s3://myawsbucket/script-path/my_script.sh"
  • Windows users:

    ruby elastic-mapreduce --create --alive --name "My Development Jobflow" --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar --args "s3://myawsbucket/script-path/my_script.sh"

This cluster runs the script my_script.sh on the master node when the step is processed.