Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
Did this page help you?  Yes | No |  Tell us about it...
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.

Connect to the Master Node Using SSH

Secure Shell (SSH) is a network protocol you can use to create a secure connection to a remote computer. After you make a connection, the terminal on your local computer behaves as if it is running on the remote computer. Commands you issue locally run on the remote computer, and the command output from the remote computer appears in your terminal window.

When you use SSH with AWS, you are connecting to an EC2 instance, which is a virtual server running in the cloud. When working with Amazon EMR, the most common use of SSH is to connect to the EC2 instance that is acting as the master node of the cluster.

Using SSH to connect to the master node gives you the ability to monitor and interact with the cluster. You can issue Linux commands on the master node, run applications such as HBase, Hive, and Pig interactively, browse directories, read log files, and so on. You can also create a tunnel in your SSH connection to view the web interfaces hosted on the master node. For more information, see View Web Interfaces Hosted on Amazon EMR Clusters.

To connect to the master node using SSH, you need the public DNS name of the master node and your Amazon EC2 key pair private key. The Amazon EC2 key pair private key is specified when you launch the cluster. If you launch a cluster from the console, the Amazon EC2 key pair private key is specified in the Security and Access section on the Create Cluster page. For more information about accessing your key pair, see Amazon EC2 Key Pairs in the Amazon EC2 User Guide for Linux Instances.

Retrieve the Public DNS Name of the Master Node

You can retrieve the master public DNS name using the Amazon EMR console, the AWS CLI, or the Amazon EMR CLI.

To retrieve the public DNS name of the master node using the Amazon EMR console

  1. In the Amazon EMR console, on the Cluster List page, click the link for your cluster.

  2. Note the Master public DNS value that appears at the top of the Cluster Details page.

    Get the master public DNS name

    Note

    You may also click the SSH link beside the master public DNS name for instructions on creating an SSH connection with the master node.

    SSH instructions

To retrieve the public DNS name of the master node using the AWS CLI

  1. To retrieve the cluster identifier, type the following command.

    aws emr list-clusters

    The output lists your clusters including the cluster IDs. Note the cluster ID for the cluster to which you are connecting.

    "Status": {
        "Timeline": {
            "ReadyDateTime": 1408040782.374,
            "CreationDateTime": 1408040501.213
        },
        "State": "WAITING",
        "StateChangeReason": {
            "Message": "Waiting after step completed"
        }
    },
    "NormalizedInstanceHours": 4,
    "Id": "j-2AL4XXXXXX5T9",
    "Name": "My cluster"
  2. To list the cluster instances including the master public DNS name for the cluster, type one of the following commands. Replace j-2AL4XXXXXX5T9 with the cluster ID returned by the previous command.

    aws emr list-instances --cluster-id j-2AL4XXXXXX5T9

    Or:

    aws emr describe-clusters --cluster-id j-2AL4XXXXXX5T9

    The output lists the cluster instances including DNS names and IP addresses. Note the value for PublicDnsName.

    "Status": {
        "Timeline": {
            "ReadyDateTime": 1408040779.263,
            "CreationDateTime": 1408040515.535
        },
        "State": "RUNNING",
        "StateChangeReason": {}
    },
    "Ec2InstanceId": "i-e89b45e7",
    "PublicDnsName": "ec2-###-##-##-###.us-west-2.compute.amazonaws.com"
    
    "PrivateDnsName": "ip-###-##-##-###.us-west-2.compute.internal",
    "PublicIpAddress": "##.###.###.##",
    "Id": "ci-12XXXXXXXXFMH",
    "PrivateIpAddress": "###.##.#.###"

For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.

To retrieve the public DNS name of the master node using the Amazon EMR CLI

Note

The Amazon EMR CLI is no longer under feature development. Customers are encouraged to use the Amazon EMR commands in the AWS CLI instead.

  • You can retrieve the Master public DNS using the Amazon EMR CLI. For more information, see the Command Line Interface Reference for Amazon EMR. In the directory where you installed the Amazon EMR CLI, type the following command.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce --list
    • Windows users:

      ruby elastic-mapreduce --list

Connect to the Master Node Using SSH on Linux, Unix, and Mac OS X

Your Linux computer most likely includes an SSH client by default. For example, OpenSSH is installed on most Linux, Unix, and Mac OS X operating systems. You can check for an SSH client by typing ssh at the command line. If your computer doesn't recognize the command, you must install an SSH client to connect to the master node. The OpenSSH project provides a free implementation of the full suite of SSH tools. For more information, go to http://www.openssh.org.

The following instructions demonstrate opening an SSH connection to the Amazon EMR master node on Linux, Unix, and Mac OS X.

To configure the key pair private key file permissions

Before you can use your Amazon EC2 key pair private key to create an SSH connection, you must set permissions on the .pem file so that only the key owner has permission to access the file. This is required for creating an SSH connection using terminal or the AWS CLI.

  1. Locate your .pem file. These instructions assume that the file is named mykeypair.pem and that it is stored in the current user's home directory.

  2. Type the following command to set the permissions. Replace ~/mykeypair.pem with the location and file name of your key pair private key file.

    chmod 400 ~/mykeypair.pem

    If you do not set permissions on the .pem file, you will receive an error indicating that your key file is unprotected and the key will be rejected. To connect, you only need to set permissions on the key pair private key file the first time you use it.

To connect to the master node using terminal

  1. Open a terminal window. On Mac OS X, choose Applications > Utilities > Terminal. On other Linux distributions, terminal is typically found at Applications > Accessories > Terminal.

  2. To establish a connection to the master node, type the following command. Replace ec2-###-##-##-###.compute-1.amazonaws.com with the master public DNS name of your cluster and replace ~/mykeypair.pem with the location and file name of your .pem file.

    ssh hadoop@ec2-###-##-##-###.compute-1.amazonaws.com -i ~/mykeypair.pem

    Important

    You must use the login name hadoop when you connect to the Amazon EMR master node, otherwise you may see an error similar to Server refused our key.

  3. A warning states that the authenticity of the host you are connecting to cannot be verified. Type yes to continue.

  4. When you are done working on the master node, type the following command to close the SSH connection.

    exit

Connect to the Master Node Using the AWS CLI or Amazon EMR CLI

You can create an SSH connection with the master node using the AWS CLI or Amazon EMR CLI on Windows and on Linux, Unix, and Mac OS X. Regardless of the platform, you need the public DNS name of the master node and your Amazon EC2 key pair private key. If you are using the AWS CLI or Amazon EMR CLI on Linux, Unix, or Mac OS X, you must also set permissions on the private key (.pem or .ppk) file as shown in To configure the key pair private key file permissions.

To connect to the master node using the AWS CLI

  1. To retrieve the cluster identifier, type:

    aws emr list-clusters

    The output lists your clusters including the cluster IDs. Note the cluster ID for the cluster to which you are connecting.

    "Status": {
        "Timeline": {
            "ReadyDateTime": 1408040782.374,
            "CreationDateTime": 1408040501.213
        },
        "State": "WAITING",
        "StateChangeReason": {
            "Message": "Waiting after step completed"
        }
    },
    "NormalizedInstanceHours": 4,
    "Id": "j-2AL4XXXXXX5T9",
    "Name": "AWS CLI cluster"
  2. Type the following command to open an SSH connection to the master node. In the following example, replace j-2AL4XXXXXX5T9 with the cluster ID and replace ~/mykeypair.key with the location and file name of your .pem file (for Linux, Unix, and Mac OS X) or .ppk file (for Windows).

    aws emr ssh --cluster-id j-2AL4XXXXXX5T9 --key-pair-file ~/mykeypair.key						
  3. When you are done working on the master node, close the AWS CLI window.

    For more information on using Amazon EMR commands in the AWS CLI, see http://docs.aws.amazon.com/cli/latest/reference/emr.

To connect to the master node using the Amazon EMR CLI

Note

The Amazon EMR CLI is no longer under feature development. Customers are encouraged to use the Amazon EMR commands in the AWS CLI instead.

  • To connect to the master node using the Amazon EMR CLI (on Linux, Unix, and Mac OS X), you must: configure your credentials.json file so the keypair value is set to the name of the keypair you used to launch the cluster, set the key-pair-file value to the full path to your private key file, set appropriate permissions on the .pem file, and install an SSH client on your machine (such as OpenSSH). You can open an SSH connection to the master node by issuing the following command. This is a handy shortcut for frequent CLI users. In the following example, replace j-3L7WXXXXXHO4H with your cluster identifier.

    • Linux, UNIX, and Mac OS X users:

      ./elastic-mapreduce -j j-3L7WXXXXXHO4H --ssh
    • Windows users:

      ruby elastic-mapreduce -j j-3L7WXXXXXHO4H --ssh

Connect to the Master Node Using SSH on Windows

Windows users can use an SSH client such as PuTTY to connect to the master node. Before connecting to the Amazon EMR master node, you should download and install PuTTY and PuTTYgen. You can download these tools from the PuTTY download page.

PuTTY does not natively support the key pair private key file format (.pem) generated by Amazon EC2. You use PuTTYgen to convert your key file to the required PuTTY format (.ppk). You must convert your key into this format (.ppk) before attempting to connect to the master node using PuTTY.

For more information about converting your key, see Converting Your Private Key Using PuTTYgen in the Amazon EC2 User Guide for Linux Instances.

To connect to the master node using PuTTY

  1. Double-click putty.exe to start PuTTY. You can also launch PuTTY from the Windows programs list.

  2. If necessary, in the Category list, click Session.

  3. In the Host Name (or IP address) field, type hadoop@MasterPublicDNS. For example: hadoop@ec2-###-##-##-###.compute-1.amazonaws.com.

  4. In the Category list, expand Connection > SSH, and then click Auth.

  5. For Private key file for authentication, click Browse and select the .ppk file that you generated.

  6. Click Open.

  7. Click Yes to dismiss the PuTTY security alert.

    Important

    When logging into the master node, type hadoop if you are prompted for a user name .

  8. When you are done working on the master node, you can close the SSH connection by closing PuTTY.

    Note

    To prevent the SSH connection from timing out, you can click Connection in the Category list and select the option Enable TCP_keepalives. If you have an active SSH session in PuTTY, you can change your settings by right-clicking the PuTTY title bar and choosing Change Settings.