Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Run a Script in a Cluster

Amazon Elastic MapReduce (Amazon EMR) enables you to run a script at any time during step processing in your cluster. You specify a step that runs a script either when you create your cluster or you can add a step if your cluster is in the WAITING state. For more information about adding steps, go to Submit Work to a Cluster. For more information about running an interactive cluster, go to Using Hive Interactively or in Batch Mode.

If you want to run a script before step processing begins, use a bootstrap action. For more information on bootstrap actions, go to Create Bootstrap Actions to Install Additional Software (Optional).

If you want to run a script immediately before cluster shutdown, use a shutdown action. For more information about shutdown actions, go to Shutdown Actions.

Submitting a Custom JAR Step Using the AWS CLI or the Amazon EMR CLI

This section describes how to add a step to run a script. The script-runner.jar takes arguments to the path to a script and any additional arguments for the script. The JAR file runs the script with the passed arguments. Script-runner.jar is located at s3://elasticmapreduce/libs/script-runner/script-runner.jar.

The cluster containing a step that runs a script looks similar to the following examples.

To add a step to run a script using the AWS CLI

  • Type the following command to add a step to run a script using the AWS CLI:

    aws emr create-cluster --ami-version string \
    --instance-groups InstanceGroupType=string,InstanceCount=integer,InstanceType=string \
    --steps Type=string,Name=string,ActionOnFailure=string,Jar=string,MainClass=string,Args=["arg1","arg2"] \
    --no-auto-terminate

    For example:

    aws emr create-cluster --ami-version 3.1.1 \
    --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m3.xlarge \
    --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://elasticmapreduce/libs/script-runner/script-runner.jar,MainClass=mymainclass,Args=["s3://mybucket/script-path/my_script.sh"] \
    --no-auto-terminate

    This cluster runs the script my_script.sh on the master node when the step is processed.

    For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.

To add a step to run a script using the Amazon EMR CLI

  • In the directory where you installed the Amazon EMR CLI, type the following command. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "My Development Jobflow" \
      --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar \
      --args "s3://mybucket/script-path/my_script.sh"
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "My Development Jobflow" --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar --args "s3://mybucket/script-path/my_script.sh"

    This cluster runs the script my_script.sh on the master node when the step is processed.