Connect to the Cluster
When you run an Amazon EMR cluster, often all you need to do is run an application to analyze your data and then collect the output from an Amazon S3 bucket. At other times, you may want to interact with the master node while the cluster is running. For example, you may want to connect to the master node to run interactive queries, check log files, debug a problem with the cluster, and so on. The following sections describe techniques that you can use to connect to the master node.
In an EMR cluster, the master node is an Amazon EC2 instance that coordinates the EC2 instances that are running as task and core nodes. The master node exposes a public DNS name that you can use to connect to it. By default, Amazon EMR creates security group rules for master and slave nodes that determine how you access the nodes. For example, the master node security group contains a rule that allows you to connect to the master node using an SSH client over TCP port 22.
You can connect to the master node only while the cluster is running. When the cluster terminates, the EC2 instance acting as the master node is terminated and is no longer available. To connect to the master node, you must also specify an Amazon EC2 key pair private key when you launch the cluster. The key pair private key provides the credentials for the SSH connection to the master node. If you launch a cluster from the console, the Amazon EC2 key pair private key is specified in the Security and Access section on the Create Cluster page.
By default, the ElasticMapReduce-master security group permits inbound SSH access from CIDR range 0.0.0.0/0. This allows SSH connections over TCP port 22 from any IP address using the appropriate credentials. You can limit this rule by identifying a specific IP address or address range suitable for your environment. For more information about modifying security group rules, see Adding Rules to a Security Group in the Amazon EC2 User Guide for Linux Instances.
Do not modify the remaining rules in the ElasticMapReduce-master security group. Modifying these rules may interfere with the operation of the cluster.