Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Open an SSH Tunnel to the Master Node

Hadoop, Ganglia, and other applications publish user interfaces as websites hosted on the master node. For security reasons, these websites are only available on the master node's local webserver (http://localhost:port) and are not published on the Internet. To connect to the local webserver on the master node you can create a an SSH tunnel between your computer and the master node. This is also known as port forwarding and creates a SOCKS proxy server. For more information about the sites you might want to view on the master node, see Web Interfaces Hosted on the Master Node.

Before you begin, you'll need the public DNS name of the master node. For information about how to locate this value, see To locate the public DNS name of the master node using the Amazon EMR console. You also must specify an Amazon EC2 key pair when you launch the cluster, as you use the key pair as the credentials for the SSH connection. If you are launching the cluster from the console, the Amazon EC2 key pair is specified on the ADVANCED OPTIONS pane of the Create a New Job Flow wizard.

Note

To permit SSH access to a master node, you must add your external source IP for TCP Port 22 to the ingress rules on the master node security group. For more information, see Adding a Security Group Rule in the Amazon Elastic Compute Cloud User Guide.

To create an SSH tunnel to the master node using Linux/Unix/Mac OS X

  • Open an SSH tunnel on your local machine using the following command:

    ssh –i path-to-keyfile -ND port_number hadoop@master-public-DNS-name

    The following shows the command with example values filled in.

    ssh -i ~/ec2-keys/myKeyPairName -ND 8157 hadoop@ec2-107-22-74-202.compute-1.amazonaws.com

    After you issue this command, the terminal remains open and not return a command prompt. It is now acting as a SOCKS server.

To create an SSH tunnel to the master node using the CLI on Linux/Unix/Mac OS X

  • If you have the Amazon EMR CLI installed and have configured your credentials.json file so the "keypair" value is set to the name of the keypair you used to launch the cluster and "key-pair-file" value is set to the full path to your keypair .pem file, and the permissions on the .pem file are set to og-rwx as shown in To configure the permissions of the keypair file using Linux/Unix/Mac OS X, and you have OpenSSH installed on your machine, you can open an SSH connection to the master node by issuing the following command. This is a handy shortcut for frequent CLI users. In the example below, replace the red text with the cluster identifier of the cluster to open an SSH tunnel and use as a SOCKS server.

    In the directory where you installed the Amazon EMR CLI, run the following from the command line. For more information, see the Command Line Interface Reference for Amazon EMR.

    ./elastic-mapreduce -j j-3L7WK3E07HO4H --socks

    Note

    The --socks feature is available only on the CLI version 2012-06-12 and later. To find out what version of the CLI you have, run elastic-mapreduce --version at the command line. You can download the latest version of the CLI from http://aws.amazon.com/code/Elastic-MapReduce/2264.

After you've created an SSH tunnel to the master node, you can browse the websites hosted there using the text-based browser Lynx, or set up proxies in Firefox using the FoxyProxy add-on. This latter technique gives you full access to the graphical version of the web pages hosted locally on the master node. For more information, see Configure FoxyProxy to View Websites Hosted on the Master Node.

If you are using a browser with a SOCKS proxy configured, as described in Configure FoxyProxy to View Websites Hosted on the Master Node, you can use the browser to access any cluster launched in the same region as the one you used to create the SSH tunnel. This works because all of the master nodes you launch in a region share the same security group and thus are able to access each other.

To close an SSH tunnel using Linux/Unix/Mac OS X

  • In the terminal, press Ctrl+C.