Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Install the Amazon EMR Command Line Interface

To install the command line interface, complete the following tasks:

Installing Ruby

The Amazon EMR CLI works with versions 1.8.7, 1.9.3, and 2.0. If your machine does not have Ruby installed, download one of those versions for use with the CLI.

To install Ruby

  1. Download and install Ruby:

  2. Verify that Ruby is running by typing the following at the command prompt:

    ruby -v

    The Ruby version is shown, confirming that you installed Ruby. The output should be similar to the following:

    ruby 1.8.7 (2012-02-08 patchlevel 358) [universal-darwin11.0]

Verifying the RubyGems package management framework

The Amazon EMR CLI requires RubyGems version 1.8 or later.

To verify the RubyGems installation and version

  • To check whether RubyGems is installed, run the following command from a terminal window. If RubyGems is installed, this command displays its version information.

    gem -v

If you don't have RubyGems installed, download and install RubyGems before you can install the Amazon EMR CLI.

To install RubyGems on Linux/Unix/Mac OS

  1. Download and extract RubyGems version 1.8 or later from http://rubyforge.org/frs/?group_id=126.

  2. Install RubyGems using the following command.

    sudo ruby setup.rb

Installing the Command Line Interface

To download the Amazon EMR CLI

  1. Create a new directory to install the Amazon EMR CLI into. From the command-line prompt, enter the following:

    mkdir elastic-mapreduce-cli
  2. Download the Amazon EMR files:

    1. Go to http://aws.amazon.com/developertools/2264.

    2. Click Download.

    3. Save the file in your newly created directory.

To install the Amazon EMR CLI

  1. Navigate to your elastic-mapreduce-cli directory.

  2. Unzip the compressed file:

    • Linux, UNIX, and Mac OS X users, from the command-line prompt, enter the following:

      unzip elastic-mapreduce-ruby.zip 
    • Windows users, from Windows Explorer, open the elastic-mapreduce-ruby.zip file and select Extract all files.

Configuring Credentials

The Amazon EMR credentials file can provide information required for many commands. You can also store command parameters in the file so you don't have to repeatedly enter that information at the command line each time you create a cluster.

Your credentials are used to calculate the signature value for every request you make. Amazon EMR automatically looks for your credentials in the file credentials.json. It is convenient to edit the credentials.json file and include your AWS credentials. An AWS key pair is a security credential similar to a password, which you use to securely connect to your instance when it is running. We recommend that you create a new key pair to use with this guide.

To create your credentials file

  1. Create a file named credentials.json in the directory where you unzipped the Amazon EMR CLI.

  2. Add the following lines to your credentials file:

    {
    "access_id": "Your AWS Access Key ID",
    "private_key": "Your AWS Secret Access Key",
    "key-pair": "Your key pair name",
    "key-pair-file": "The path and name of your PEM file",
    "log_uri": "A path to a bucket you own on Amazon S3, such as, s3n://mylog-uri/",
    "region": "The region of your cluster, either us-east-1, us-west-2, us-west-1, eu-west-1, ap-northeast-1, ap-southeast-1, ap-southeast-2, or sa-east-1"
    }

    Note the name of the region. You use this region to create your Amazon EC2 key pair and your Amazon S3 bucket.

The next sections explain how to create and find your credentials.

AWS Security Credentials

AWS uses security credentials to help protect your data. This section, shows you how to view your security credentials so you can add them to your credentials.json file.

For CLI access, you need an access key ID and secret access key. Use IAM user access keys instead of AWS root account access keys. IAM lets you securely control access to AWS services and resources in your AWS account. For more information about creating access keys, see How Do I Get Security Credentials? in the AWS General Reference.

Set your access_id parameter to the value of your access key ID and set your private_key parameter to the value of your secret access key.

To create an Amazon EC2 key pair

  1. Sign in to the AWS Management Console and open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.

  2. From the EC2 Dashboard, select the region you used in your credentials.json file, then click Key Pair.

  3. On the Key Pairs page, click Create Key Pair.

  4. Enter a name for your key pair, such as, mykeypair.

  5. Click Create.

  6. Save the resulting PEM file in a safe location.

In your credentials.json file, change the key-pair parameter to your Amazon EC2 key pair name and change the key-pair-file parameter to the location and name of your PEM file. This PEM file is what the CLI uses as the default for the Amazon EC2 key pair for the EC2 instances it creates when it launches a cluster.

Amazon S3 Bucket

The log-uri parameter specifies a location in Amazon S3 for the Amazon EMR results and log files from your cluster. The value of the log-uri parameter is an Amazon S3 bucket that you create for this purpose.

To create an Amazon S3 bucket

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. Click Create Bucket.

  3. In the Create a Bucket dialog box, enter a bucket name, such as mylog-uri.

    This name should be globally unique, and cannot be the same name used by another bucket. For more information about valid bucket names, see http://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html.

  4. Select the Region for your bucket.

    If your Amazon EMR region is...Select the Amazon S3 region...
    us-east-1US Standard
    us-west-2Oregon
    us-west-1Northern California
    eu-west-1Ireland
    ap-northeast-1Japan
    ap-southeast-1Singapore
    ap-southeast-2Sydney
    sa-east-1Sao Paulo
    us-gov-west-1GovCloud

    Note

    To use the AWS GovCloud region, contact your AWS business representative. You can't create an AWS GovCloud account on the AWS website. You must engage directly with AWS and sign an AWS GovCloud (US) Enterprise Agreement. For more information, see the AWS GovCloud (US) Product Page.

  5. Click Create.

    Note

    If you enable logging in the Create a Bucket wizard, it enables only bucket access logs, not Amazon EMR cluster logs.

You have created a bucket with the URI s3n://mylog-uri/.

After creating your bucket, set the appropriate permissions on it. Typically, you give yourself (the owner) read and write access and give authenticated users read access.

To set permissions on an Amazon S3 bucket

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. In the Buckets pane, right-click the bucket you just created.

  3. Select Properties.

  4. In the Properties pane, select the Permissions tab.

  5. Click Add more permissions.

  6. Select Authenticated Users in the Grantee field.

  7. To the right of the Grantee field, select List.

  8. Click Save.

You have now created a bucket and assigned it permissions. Set your log-uri parameter to this bucket's URI as the location for Amazon EMR to upload your logs and results.

SSH Setup and Configuration

Configure your SSH credentials for use with either SSH or PuTTY. This step is required.

To configure your SSH credentials

  • Configure your computer to use SSH:

    • Linux, UNIX, and Mac OS X users, set the permissions on the PEM file for your Amazon EC2 key pair. For example, if you saved the file as mykeypair.pem, the command looks like:

      chmod og-rwx mykeypair.pem  
    • Windows users

      1. Windows users use PuTTY to connect to the master node. Download PuTTYgen.exe to your computer from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.

      2. Launch PuTTYgen.

      3. Click Load. Select the PEM file you created earlier.

      4. Click Open.

      5. Click OK on the PuTTYgen Notice telling you the key was successfully imported.

      6. Click Save private key to save the key in the PPK format.

      7. When PuTTYgen prompts you to save the key without a pass phrase, click Yes.

      8. Enter a name for your PuTTY private key, such as, mykeypair.ppk.

      9. Click Save.

      10. Exit the PuTTYgen application.

Verify installation of the Amazon EMR CLI

  • In the directory where you installed the Amazon EMR CLI, run the following commands from the command line:

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --version
    • Windows users:

      ruby elastic-mapreduce --version

    If the CLI is correctly installed and the credentials properly configured, the CLI should display its version number represented as a date. The output should look similar to the following:

    Version 2012-12-17