|« PreviousNext »|
|Did this page help you? Yes | No | Tell us about it...|
Secure Shell (SSH) is a network protocol you can use to create a secure connection to a remote computer. Once you've made this connection, it's as if the terminal on your local computer is running on the remote computer. Commands you issue locally run on the remote computer and the output of those commands from the remote computer appears in your terminal window.
When you use SSH with AWS, you are connecting to an EC2 instance, which is a virtual server running in the cloud. When working with Amazon EMR, the most common use of SSH is to connect to the EC2 instance that is acting as the master node of the cluster.
Using SSH to connect to the master node gives you the ability to monitor and interact with the cluster. You can issue Linux commands on the master node, run applications such as HBase, Hive, and Pig interactively, browse directories, read log files, and more.
To connect to the master node using SSH, you need the public DNS name of the master node. You also must specify an Amazon EC2 key pair when you launch the cluster, as you use the key pair as the credentials for the SSH connection. If you are launching the cluster from the console, the Amazon EC2 key pair is specified on the ADVANCED OPTIONS pane of the Create a New Job Flow wizard.
To permit SSH access to a master node, you must add your external source IP for TCP Port 22 to the ingress rules on the master node security group. For more information, see Adding a Security Group Rule in the Amazon Elastic Compute Cloud User Guide.
To locate the public DNS name of the master node using the Amazon EMR console
In the Amazon EMR console, select the job from the list of running clusters in the
RUNNING state. Details about the cluster appear in the
The DNS name you used to connect to the instance is listed on the Description tab as Master public DNS name.
To locate the public DNS name of the master node using the CLI
If you have the Amazon EMR CLI installed, you can retrieve the public DNS name of the master by running the following command.
In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.
Linux, UNIX, and Mac OS X users:
ruby elastic-mapreduce --list
This returns a list of all the currently active clusters in the following format. In the example below, ec2-204-236-242-218.compute-1.amazonaws.com, is the public DNS name of the master node for the cluster j-3L7WK3E07HO4H.
j-3L7WK3E07HO4H WAITING ec2-204-236-242-218.compute-1.amazonaws.com My Job Flow
OpenSSH is installed on most Linux, Unix, and Mac OS X operating systems. Windows users can use an application called PuTTY to connect to the master node. Following are platform-specific instructions for opening an SSH connection.
To configure the permissions of the keypair file using Linux/Unix/Mac OS X
Before you can use the keypair file to create an SSH connection, you must set permissions on the PEM file for your Amazon EC2 key pair so that only the key owner has permissions to access the key. For example, if you saved the file as
mykeypair.pem in the user's home directory, the command is:
chmod og-rwx ~/mykeypair.pem
If you do not do this, SSH returns an error saying that your private key file is unprotected and will reject the key. You only need to configure these permissions the first time you use the private key to connect.
To connect to the master node using Linux/Unix/Mac OS X
Open a terminal window. This is found at Applications/Utilities/Terminal on Mac OS X and at Applications/Accessories/Terminal on many Linux distributions.
Check that SSH is installed by running the following command. If SSH is installed, this command returns the SSH version number. If SSH is not installed, you'll need to install the OpenSSH package from a repository.
To establish the connection to the master node, enter the following
command line, which assumes the PEM file is in the
user's home directory. Replace
ec2-107-22-74-202.compute-1.amazonaws.com with the
Master public DNS name of your cluster and replace
~/mykeypair.pem with the location and filename of your
A warning states that the authenticity of the host you are connecting to can't be verified.
You must use the login name
hadoop when you connect to an Amazon EMR cluster node, otherwise an error similar to
Server refused our key error may occur.
yes to continue.
To connect to the master node using the CLI on Linux/Unix/Mac OS X
If you have the Amazon EMR CLI installed and have configured your credentials.json file so the "keypair" value is set to the name of the keypair you used to launch the cluster and "key-pair-file" value is set to the full path to your keypair .pem file, and the permissions on the .pem file are set to
og-rwx as shown in To configure the permissions of the keypair file using Linux/Unix/Mac OS X, and you have OpenSSH installed on your machine, you can open an SSH connection to the master node by issuing the following command. This is a handy shortcut for frequent CLI users. In the example below you would replace the red text with the cluster identifier of the cluster to connect to.
To close an SSH connection using Linux/Unix/Mac OS X
When you are done working on the master node, you can close the SSH connection using the
To install and configure PuTTY on Windows
Download PuTTYgen.exe and PuTTY.exe to your computer from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html.
Select the PEM file you created earlier. Note that you may have to change the search parameters from file of type “PuTTY Private Key Files (*.ppk) to “All Files (*.*)”.
Click OK on the PuTTYgen notice telling you the key was successfully imported.
For the option Type of key to generate, choose SSH2-RSA.
Click Save private key to save the key in the PPK format.
When PuTTYgen prompts you to save the key without a pass phrase, click Yes.
Enter a name for your PuTTY private key, such as
To connect to the master node using PuTTY on Windows
Select Session in the Category list. Enter
hadoop@DNS in the Host Name field. The input looks similar to
In the Category list, expand Connection, expand SSH, and then select Auth. The Options controlling the SSH authentication pane appears.
For Private key file for authentication, click Browse and select the private key file you generated earlier. If you are following this guide, the file name is
A PuTTY Security Alert pops up.
Click Yes for the PuTTY Security Alert.
If you are asked to log in, enter