Menu
Amazon EMR
Management Guide

(Optional) Create Bootstrap Actions to Install Additional Software

You can use a bootstrap action to install additional software on your cluster. Bootstrap actions are scripts that are run on the cluster nodes when Amazon EMR launches the cluster. They run before Amazon EMR installs specified applications and the node begins processing data. If you add nodes to a running cluster, bootstrap actions run on those nodes also. You can create custom bootstrap actions and specify them when you create your cluster.

Note

Most predefined bootstrap actions for Amazon EMR AMI versions 2.x and 3.x are not supported in Amazon EMR releases 4.x. For example, configure-Hadoop and configure-daemons are not supported in Amazon EMR release 4.x. Instead, Amazon EMR release 4.x natively provides this functionality. For more information about how to migrate bootstrap actions from Amazon EMR AMI versions 2.x and 3.x to Amazon EMR release 4.x, go to Differences Introduced in 4.x in the Amazon EMR Release Guide.

Bootstrap Action Basics

Bootstrap actions execute as the Hadoop user by default. You can execute a bootstrap action with root privileges by using sudo.

All Amazon EMR management interfaces support bootstrap actions. You can specify up to 16 bootstrap actions per cluster by providing multiple bootstrap-action parameters from the console, AWS CLI, or API.

From the Amazon EMR console, you can optionally specify a bootstrap action while creating a cluster.

When you use the CLI, you can pass references to bootstrap action scripts to Amazon EMR by adding the --bootstrap-action parameter when you create the cluster using the create-cluster command. The syntax for a --bootstrap-action parameter is as follows:

AWS CLI

--bootstrap-action Path=s3://mybucket/filename",Args=[arg1,arg2]

If the bootstrap action returns a nonzero error code, Amazon EMR treats it as a failure and terminates the instance. If too many instances fail their bootstrap actions, then Amazon EMR terminates the cluster. If just a few instances fail, Amazon EMR attempts to reallocate the failed instances and continue. Use the cluster lastStateChangeReason error code to identify failures caused by a bootstrap action.

Run If Bootstrap Action

Amazon EMR provides this predefined bootstrap action to run a command conditionally when an instance-specific value is found in the instance.json or job-flow.json file. The command can refer to a file in Amazon S3 that Amazon EMR can download and execute.

The location of the script is s3://elasticmapreduce/bootstrap-actions/run-if.

The following example echoes the string "running on master node" if the node is a master.

To run a command conditionally using the AWS CLI

When using the AWS CLI to include a bootstrap action, specify the Path and Args as a comma-separated list.

  • To launch a cluster with a bootstrap action that conditionally runs a command when an instance-specific value is found in the instance.json or job-flow.json file, type the following command and replace myKey with the name of your EC2 key pair.

    aws emr create-cluster --name "Test cluster" --release-label emr-4.0.0 --use-default-roles --ec2-attributes KeyName=myKey --applications Name=Hive --instance-count 1 --instance-type m3.xlarge --bootstrap-action Path=s3://elasticmapreduce/bootstrap-actions/run-if,Args=["instance.isMaster=true","echo running on master node"]

    When you specify the instance count without using the --instance-groups parameter, a single Master node is launched, and the remaining instances are launched as core nodes. All nodes will use the instance type specified in the command.

    Note

    If you have not previously created the default Amazon EMR service role and EC2 instance profile, type aws emr create-default-roles to create them before typing the create-cluster subcommand.

    For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.

Shutdown Actions

A bootstrap action script can create one or more shutdown actions by writing scripts to the /mnt/var/lib/instance-controller/public/shutdown-actions/ directory. When a cluster is terminated, all the scripts in this directory are executed in parallel. Each script must run and complete within 60 seconds.

Shutdown action scripts are not guaranteed to run if the node terminates with an error.

Note

When using Amazon EMR versions 4.0 and later, you must manually create the /mnt/var/lib/instance-controller/public/shutdown-actions/ directory on the master node. It doesn't exist by default; however, after being created, scripts in this directory nevertheless run before shutdown. For more information about connecting to the Master node to create directories, see Connect to the Master Node Using SSH.

Use Custom Bootstrap Actions

You can create a custom script to perform a customized bootstrap action. Any of the Amazon EMR interfaces can reference a custom bootstrap action.

Add Custom Bootstrap Actions Using the AWS CLI or the Amazon EMR CLI

The following example uses a bootstrap action script to download and extracts a compressed TAR archive from Amazon S3. The sample script is stored at http://elasticmapreduce.s3.amazonaws.com/bootstrap-actions/download.sh.

The sample script looks like the following:

#!/bin/bash set -e wget -S -T 10 -t 5 http://elasticmapreduce.s3.amazonaws.com/bootstrap-actions/file.tar.gz mkdir -p /home/hadoop/contents tar -xzf file.tar.gz -C /home/hadoop/contents

To create a cluster with a custom bootstrap action using the AWS CLI

When using the AWS CLI to include a bootstrap action, specify the Path and Args as a comma-separated list. The following example does not use an arguments list.

  • To launch a cluster with a custom bootstrap action, type the following command, replace myKey with the name of your EC2 key pair.

    • Linux, UNIX, and Mac OS X users:

      aws emr create-cluster --name "Test cluster" --release-label emr-4.0.0 \ --use-default-roles --ec2-attributes KeyName=myKey \ --applications Name=Hive Name=Pig \ --instance-count 3 --instance-type m3.xlarge \ --bootstrap-action Path="s3://elasticmapreduce/bootstrap-actions/download.sh"
    • Windows users:

      aws emr create-cluster --name "Test cluster" --release-label emr-4.2.0 --use-default-roles --ec2-attributes KeyName=myKey --applications Name=Hive Name=Pig --instance-count 3 --instance-type m3.xlarge --bootstrap-action Path="s3://elasticmapreduce/bootstrap-actions/download.sh"

    When you specify the instance count without using the --instance-groups parameter, a single Master node is launched, and the remaining instances are launched as core nodes. All nodes will use the instance type specified in the command.

    Note

    If you have not previously created the default Amazon EMR service role and EC2 instance profile, type aws emr create-default-roles to create them before typing the create-cluster subcommand.

    For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.

Add Custom Bootstrap Actions Using the Console

The following procedure describes how to use your own custom bootstrap action.

To create a cluster with a custom bootstrap action using the console

  1. Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/.

  2. Choose Create cluster.

  3. Click Go to advanced options.

  4. In Create Cluster - Advanced Options, Steps 1 and 2 choose the options as desired and proceed to Step 3: General Cluster Settings.

  5. Under Bootstrap Actions select Configure and add to specify the Name, JAR location, and arguments for your bootstrap action. Choose Add.

  6. Optionally add more bootstrap actions as desired.

  7. Proceed to create the cluster. Your bootstrap action(s) will be performed after the cluster has been provisioned and initialized.

While the cluster's master node is running, you can connect to the master node and see the log files that the bootstrap action script generated in the /mnt/var/log/bootstrap-actions/1 directory.

Related Topics

Use a Custom Bootstrap Action to Copy an Object from Amazon S3 to Each Node

You can use a bootstrap action to copy objects from Amazon S3 to each node in a cluster before your applications are installed. The AWS CLI is installed on each node of a cluster, so your bootstrap action can call AWS CLI commands.

The following example demonstrates a simple bootstrap action script that copies a file, myfile.jar, from Amazon S3 to a local folder, /mnt1/myfolder, on each cluster node. The script is saved to Amazon S3 with the file name copymyfile.sh with the following contents.

aws s3 cp s3://mybucket/myfilefolder/myfile.jar /mnt1/myfolder

When you launch the cluster, you specify the script. The following AWS CLI example demonstrates this:

aws emr create-cluster --name "Test cluster" --release-label emr-4.1.0emr-4.2.0 \ --use-default-roles --ec2-attributes KeyName=myKey \ --applications Name=Hive Name=Pig \ --instance-count 3 --instance-type m3.xlarge \ --bootstrap-action Path="s3://mybucket/myscriptfolder/copymyfile.sh"