Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Connect to the Master Node Using SSH

Secure Shell (SSH) is a network protocol you can use to create a secure connection to a remote computer. Once you've made this connection, it's as if the terminal on your local computer is running on the remote computer. Commands you issue locally run on the remote computer and the output of those commands from the remote computer appears in your terminal window.

When you use SSH with AWS, you are connecting to an EC2 instance, which is a virtual server running in the cloud. When working with Amazon EMR, the most common use of SSH is to connect to the EC2 instance that is acting as the master node of the cluster.

Using SSH to connect to the master node gives you the ability to monitor and interact with the cluster. You can issue Linux commands on the master node, run applications such as HBase, Hive, and Pig interactively, browse directories, read log files, and more.

To connect to the master node using SSH, you need the public DNS name of the master node. You also must specify an Amazon EC2 key pair when you launch the cluster, as you use the key pair as the credentials for the SSH connection. If you are launching the cluster from the console, the Amazon EC2 key pair is specified on the ADVANCED OPTIONS pane of the Create a New Job Flow wizard.

Note

To permit SSH access to a master node, you must add your external source IP for TCP Port 22 to the ingress rules on the master node security group. For more information, see Adding a Security Group Rule in the Amazon Elastic Compute Cloud User Guide.

To locate the public DNS name of the master node using the Amazon EMR console

  • In the Amazon EMR console, select the job from the list of running clusters in the WAITING or RUNNING state. Details about the cluster appear in the lower pane.

    Get the DNS Name

    The DNS name you used to connect to the instance is listed on the Description tab as Master public DNS name.

To locate the public DNS name of the master node using the CLI

  • If you have the Amazon EMR CLI installed, you can retrieve the public DNS name of the master by running the following command.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --list
    • Windows users:

      ruby elastic-mapreduce --list

    This returns a list of all the currently active clusters in the following format. In the example below, ec2-204-236-242-218.compute-1.amazonaws.com, is the public DNS name of the master node for the cluster j-3L7WK3E07HO4H.

    j-3L7WK3E07HO4H     WAITING        ec2-204-236-242-218.compute-1.amazonaws.com       My Job Flow
    				

OpenSSH is installed on most Linux, Unix, and Mac OS X operating systems. Windows users can use an application called PuTTY to connect to the master node. Following are platform-specific instructions for opening an SSH connection.

To configure the permissions of the keypair file using Linux/Unix/Mac OS X

  • Before you can use the keypair file to create an SSH connection, you must set permissions on the PEM file for your Amazon EC2 key pair so that only the key owner has permissions to access the key. For example, if you saved the file as mykeypair.pem in the user's home directory, the command is:

    chmod og-rwx ~/mykeypair.pem 

    If you do not do this, SSH returns an error saying that your private key file is unprotected and will reject the key. You only need to configure these permissions the first time you use the private key to connect.

To connect to the master node using Linux/Unix/Mac OS X

  1. Open a terminal window. This is found at Applications/Utilities/Terminal on Mac OS X and at Applications/Accessories/Terminal on many Linux distributions.

  2. Check that SSH is installed by running the following command. If SSH is installed, this command returns the SSH version number. If SSH is not installed, you'll need to install the OpenSSH package from a repository.

    ssh -v
  3. To establish the connection to the master node, enter the following command line, which assumes the PEM file is in the user's home directory. Replace ec2-107-22-74-202.compute-1.amazonaws.com with the Master public DNS name of your cluster and replace ~/mykeypair.pem with the location and filename of your PEM file.

    ssh hadoop@ec2-107-22-74-202.compute-1.amazonaws.com -i ~/mykeypair.pem

    A warning states that the authenticity of the host you are connecting to can't be verified.

    Important

    You must use the login name hadoop when you connect to an Amazon EMR cluster node, otherwise an error similar to Server refused our key error may occur.

  4. Type yes to continue.

To connect to the master node using the CLI on Linux/Unix/Mac OS X

  • If you have the Amazon EMR CLI installed and have configured your credentials.json file so the "keypair" value is set to the name of the keypair you used to launch the cluster and "key-pair-file" value is set to the full path to your keypair .pem file, and the permissions on the .pem file are set to og-rwx as shown in To configure the permissions of the keypair file using Linux/Unix/Mac OS X, and you have OpenSSH installed on your machine, you can open an SSH connection to the master node by issuing the following command. This is a handy shortcut for frequent CLI users. In the example below you would replace the red text with the cluster identifier of the cluster to connect to.

    ./elastic-mapreduce -j j-3L7WK3E07HO4H --ssh
    						

To close an SSH connection using Linux/Unix/Mac OS X

  • When you are done working on the master node, you can close the SSH connection using the exit command.

    exit

To install and configure PuTTY on Windows

  1. Download PuTTYgen.exe and PuTTY.exe to your computer from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.

  2. Launch PuTTYgen.

  3. Click Load.

  4. Select the PEM file you created earlier. Note that you may have to change the search parameters from file of type “PuTTY Private Key Files (*.ppk) to “All Files (*.*)”.

  5. Click Open.

  6. Click OK on the PuTTYgen notice telling you the key was successfully imported.

  7. For the option Type of key to generate, choose SSH2-RSA.

  8. Click Save private key to save the key in the PPK format.

  9. When PuTTYgen prompts you to save the key without a pass phrase, click Yes.

  10. Enter a name for your PuTTY private key, such as mykeypair.ppk.

  11. Click Save.

  12. Close PuTTYgen.

To connect to the master node using PuTTY on Windows

  1. Start PuTTY.

  2. Select Session in the Category list. Enter hadoop@DNS in the Host Name field. The input looks similar to hadoop@ec2-184-72-128-177.compute-1.amazonaws.com.

  3. In the Category list, expand Connection, expand SSH, and then select Auth. The Options controlling the SSH authentication pane appears.

    SSH Options in PuTTy

  4. For Private key file for authentication, click Browse and select the private key file you generated earlier. If you are following this guide, the file name is mykeypair.ppk.

  5. Click Open.

    A PuTTY Security Alert pops up.

  6. Click Yes for the PuTTY Security Alert.

    Important

    If you are asked to log in, enter hadoop.