| « PreviousNext » | |
![]() ![]() ![]() | Did this page help you? Yes | No | Tell us about it... |
Topics
You can create clusters consisting of multiple steps using the Amazon EMR command line interface (CLI). The Amazon EMR console supports creating only single-step clusters. This document primarily describes how to manage clusters with the Amazon EMR CLI. For more information about how to use the Amazon EMR console and the Amazon EMR API, see the Amazon Elastic MapReduce Developer Guide and the Amazon Elastic MapReduce API Reference.
The Amazon EMR CLI requires Ruby 1.8.7 and is not compatible with later versions of Ruby. After you have installed Ruby, unzip elastic-mapreduce-ruby.zip into a directory, and the Amazon EMR CLI is ready to use.
To install Ruby
Download and install Ruby 1.8.7:
Linux and UNIX users can download Ruby from http://www.ruby-lang.org/en/news/2010/06/23/ruby-1-8-7-p299-released/ and install Ruby by entering the command:
sudo apt-get install ruby-full
Windows users can install Ruby 1.8.7 from http://rubyforge.org/frs/?group_id=167&release_id=28426. During the installation process, select the checkboxes to add Ruby executables to your PATH environmental variable and to associate .rb files with this Ruby installation.
Mac OS X comes with Ruby installed. You can check the version as shown in the following step.
Verify that Ruby is running by typing the following at the command prompt:
ruby -v
The Ruby version is shown, confirming that you installed Ruby. The output should be similar to the following:
ruby 1.8.7 (2012-02-08 patchlevel 358) [universal-darwin11.0]
To download the Amazon EMR CLI
Create a new directory to install the Amazon EMR CLI into. From the command-line prompt, enter the following:
mkdir elastic-mapreduce-cli
Download the Amazon EMR files:
Go to http://aws.amazon.com/developertools/2264. If you are not logged in to AWS, enter your AWS account credentials when prompted.
Click Download.
Save the file in your newly created directory.
To install the Amazon EMR CLI
Navigate to your elastic-mapreduce-cli directory.
Unzip the compressed file:
Linux, UNIX, and Mac OS X users, from the command-line prompt, enter the following:
unzip elastic-mapreduce-ruby.zip
Windows users, from Windows Explorer, open the
elastic-mapreduce-ruby.zip file and select Extract all files.
The Amazon EMR credentials file can provide information required for many commands. You can also store command parameters in the file so you don't have to repeatedly enter that information at the command line each time you create a cluster.
Your credentials are used to calculate the signature value for every request you
make. Amazon EMR automatically looks for your credentials in the file
credentials.json. It is convenient to edit the
credentials.json file and include your AWS credentials. An AWS key pair
is a security credential similar to a password, which you use to securely connect to
your instance when it is running. We recommend that you create a new key pair to use
with this guide.
To create your credentials file
Create a file named credentials.json in the directory where you unzipped the Amazon EMR CLI.
Add the following lines to your credentials file:
{
"access_id": "Your AWS Access Key ID",
"private_key": "Your AWS Secret Access Key",
"keypair": "Your key pair name",
"key-pair-file": "The path and name of your PEM file",
"log_uri": "A path to a bucket you own on Amazon S3, such as, s3n://mylog-uri/",
"region": "The region of your cluster, either us-east-1, us-west-2, us-west-1, eu-west-1, ap-northeast-1, ap-southeast-1, ap-southeast-2, or sa-east-1"
}Note the name of the region. You use this region to create your Amazon EC2 key pair and your Amazon S3 bucket.
The next sections explain how to create and find your credentials.
AWS uses security credentials to help protect your data. This section, shows you
how to view your security credentials so you can add them to your
credentials.json file.
AWS assigns you an Access Key ID and a Secret Access Key. You include your Access Key ID in all AWS service requests to identify yourself as the sender of the request.
Note
Your Secret Access Key is a shared secret between you and AWS. Keep this ID secret; we use it to bill you for the AWS services you use. Never include the ID in your requests to AWS and never email the ID to anyone even if an inquiry appears to originate from AWS or Amazon.com. No one who legitimately represents Amazon will ever ask you for your Secret Access Key.
To locate your AWS Access Key ID and AWS Secret Access Key
Go to the AWS website at http://aws.amazon.com.
Click My Account to display a list of options.
Click Security Credentials and log in to your AWS account. Your Access Key ID is displayed in the Access Credentials section. Your Secret Access Key remains hidden as a further precaution.
To display your Secret Access Key, click Show in the Your Secret Access Key area, as shown in the following figure.

Set your access_id parameter to the value of your Access
Key ID and set your private_key parameter to the value of your
Secret Access Key.
To create an Amazon EC2 key pair
Sign in to the AWS Management Console and open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
From the EC2 Dashboard, select the region you used in your credentials.json file, then click Key Pair.
On the Key Pairs page, click Create Key Pair.
Enter a name for your key pair, such as, mykeypair.
Click Create.
Save the resulting PEM file in a safe location.
In your credentials.json file, change the
keypair parameter to your Amazon EC2 key pair name and
change the key-pair-file parameter to the location and name of
your PEM file. This PEM file is what the CLI uses as the default for the Amazon EC2 key pair for the EC2 instances it creates when it launches a cluster.
The log-uri parameter specifies a location in Amazon S3 for
the Amazon EMR results and log files from your cluster. The value of the
log-uri parameter is an Amazon S3 bucket that you create
for this purpose.
To create an Amazon S3 bucket
Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
Click Create Bucket.
In the Create a Bucket dialog box, enter a bucket name, such as mylog-uri.
This name should be globally unique, and cannot be the same name used by another bucket. For more information about valid bucket names, see http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html.
Select the Region for your bucket.
| If your Amazon EMR region is... | Select the Amazon S3 region... |
|---|---|
| us-east-1 | US Standard |
| us-west-2 | Oregon |
| us-west-1 | Northern California |
| eu-west-1 | Ireland |
| ap-northeast-1 | Japan |
| ap-southeast-1 | Singapore |
| sa-east-1 | Sao Paulo |
| us-gov-west-1 | GovCloud |
Note
To use the AWS GovCloud region, contact your AWS business representative. You can’t create an AWS GovCloud account on the AWS website. You must engage directly with AWS and sign an AWS GovCloud (US) Enterprise Agreement. For more information, see the AWS GovCloud (US) Product Page.
Click Create.
Note
If you enable logging in the Create a Bucket wizard, it enables only bucket access logs, not Amazon EMR cluster logs.
You have created a bucket with the URI s3n://mylog-uri/.
After creating your bucket, set the appropriate permissions on it. Typically, you give yourself (the owner) read and write access and give authenticated users read access.
To set permissions on an Amazon S3 bucket
Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
In the Buckets pane, right-click the bucket you just created.
Select Properties.
In the Properties pane, select the Permissions tab.
Click Add more permissions.
Select Authenticated Users in the Grantee field.
To the right of the Grantee field, select List.
Click Save.
You have now created a bucket and assigned it permissions. Set your
log-uri parameter to this bucket's URI as the location for
Amazon EMR to upload your logs and results.
Configure your SSH credentials for use with either SSH or PuTTY. This step is required.
To configure your SSH credentials
Configure your computer to use SSH:
Linux, UNIX, and Mac OS X users, set the permissions on the PEM file for your Amazon
EC2 key pair. For example, if you saved the file as
mykeypair.pem, the command looks like:
chmod og-rwx mykeypair.pem
Windows users
Windows users use PuTTY to connect to the master node. Download PuTTYgen.exe to your computer from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.
Launch PuTTYgen.
Click Load. Select the PEM file you created earlier.
Click Open.
Click OK on the PuTTYgen Notice telling you the key was successfully imported.
Click Save private key to save the key in the PPK format.
When PuTTYgen prompts you to save the key without a pass phrase, click Yes.
Enter a name for your PuTTY private key, such as,
mykeypair.ppk.
Click Save.
Exit the PuTTYgen application.
Verify installation of the Amazon EMR CLI
In the directory where you installed the Amazon EMR CLI, run the following commands from the command line:
Linux, UNIX, and Mac OS X users:
./elastic-mapreduce --version
Windows users:
ruby elastic-mapreduce --version
If the CLI is correctly installed and the credentials properly configured, the CLI should display its version number represented as a date. The output should look similar to the following:
Version 2012-12-17