Amazon Elastic MapReduce
Developer Guide (API Version 2009-03-31)
« PreviousNext »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

Connect to the Cluster

Often when you run an Amazon Elastic MapReduce (Amazon EMR) cluster, all you need to do is launch the analysis and then collect the output from an Amazon S3 bucket. There are other times, however, when you'll want to interact with the master node while the cluster is running. For example, you may want to connect to the master node to run interactive queries, check log files, monitor performance using an application such as Ganglia that runs on the master node, debug a problem with the cluster, and more. The following sections describe techniques you can use to connect to the master node.

In an Amazon EMR cluster, the master node is an EC2 instance that coordinates the EC2 instances that are running as task and core nodes. The master node exposes a public DNS name that you can use to connect to it.

Note

You can connect to the master node only while the cluster is running. After the cluster terminates, the EC2 instance acting as the master node is terminated and no longer available. You also must specify an Amazon EC2 key pair when you launch the cluster, as you use the key pair as the credentials for the SSH connection. If you are launching the cluster from the console, the Amazon EC2 key pair is specified on the ADVANCED OPTIONS pane of the Create a New Job Flow wizard.

Note

To permit SSH access to a master node, you must add your external source IP for TCP Port 22 to the ingress rules on the master node security group. For more information, see Adding a Security Group Rule in the Amazon Elastic Compute Cloud User Guide.