Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

Run a Script in a Cluster

Amazon Elastic MapReduce (Amazon EMR) enables you to run a script at any time during step processing in your cluster. You specify a step that runs a script either when you create your cluster or you can add a step if your cluster is in the WAITING state. For more information about adding steps, go to Submit Work to a Cluster. For more information about running an interactive cluster, go to Using Hive Interactively or in Batch Mode.

If you want to run a script before step processing begins, use a bootstrap action. For more information on bootstrap actions, go to (Optional) Create Bootstrap Actions to Install Additional Software.

If you want to run a script immediately before cluster shutdown, use a shutdown action. For more information about shutdown actions, go to Shutdown Actions.

Submitting a Custom JAR Step Using the AWS CLI or the Amazon EMR CLI

This section describes how to add a step to run a script. The script-runner.jar takes arguments to the path to a script and any additional arguments for the script. The JAR file runs the script with the passed arguments. Script-runner.jar is located at s3://elasticmapreduce/libs/script-runner/script-runner.jar.

The cluster containing a step that runs a script looks similar to the following examples.

To add a step to run a script using the AWS CLI

  • To run a script using the AWS CLI, type the following command, replace myKey with the name of your EC2 key pair and replace mybucket with your Amazon S3 bucket. This cluster runs the script my_script.sh on the master node when the step is processed.

    • Linux, UNIX, and Mac OS X users:

      aws emr create-cluster --name "Test cluster" --ami-version 3.3 --applications Name=Hue Name=Hive Name=Pig \
      --use-default-roles --ec2-attributes KeyName=myKey \
      --instance-type m3.xlarge --instance-count 3 \
      --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://mybucket/script-path/my_script.sh"]
    • Windows users:

      aws emr create-cluster --name "Test cluster" --ami-version 3.3 --applications Name=Hue Name=Hive Name=Pig --use-default-roles --ec2-attributes KeyName=myKey --instance-type m3.xlarge --instance-count 3 --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://mybucket/script-path/my_script.sh"]

    When you specify the instance count without using the --instance-groups parameter, a single Master node is launched, and the remaining instances are launched as core nodes. All nodes will use the instance type specified in the command.

    Note

    If you have not previously created the default EMR service role and EC2 instance profile, type aws emr create-default-roles to create them before typing the create-cluster subcommand.

    For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.

To add a step to run a script using the Amazon EMR CLI

  • In the directory where you installed the Amazon EMR CLI, type the following command. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --create --alive --name "My Development Jobflow" \
      --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar \
      --args "s3://mybucket/script-path/my_script.sh"
    • Windows users:

      ruby elastic-mapreduce --create --alive --name "My Development Jobflow" --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar --args "s3://mybucket/script-path/my_script.sh"

    This cluster runs the script my_script.sh on the master node when the step is processed.