Menu
Amazon EMR
Developer Guide

Launch Clusters into a VPC

After you have a subnet that is configured to host Amazon EMR clusters, launch the cluster in that subnet by specifying the associated subnet identifier when creating the cluster.

Note

Amazon EMR supports private subnets in release versions 4.2 and above. For further information about private subnets in EMR, see Amazon EMR Management Guide.

When the cluster is launched, Amazon EMR adds security groups based on whether the cluster is launching into VPC private or public subnets. All security groups allow ingress at port 8443 to communicate to the Amazon EMR service, but IP address ranges vary for public and private subnets. Amazon EMR manages all of these security groups, and may need to add additional IP addresses to the AWS range over time.

In public subnets, Amazon EMR creates ElasticMapReduce-slave and ElasticMapReduce-master for the slave and master instance groups, respectively. By default, the ElasticMapReduce-master security group allows inbound SSH connections while the ElasticMapReduce-slave group does not. Both master and slave security groups allow inbound traffic on port 8443 from the AWS public IP range. If you require SSH access for slave (core and task) nodes, you can add a rule to the ElasticMapReduce-slave security group or use SSH agent forwarding.

Other security groups and rules are required when launching clusters in a private subnet. This is to ensure that the service can still manage those resources while they are private. The additional security groups are: ElasticMapReduce-Master-Private, ElasticMapReduce-Slave-Private.. The security group for the ENI is of the form ElasticMapReduce-ServiceAccess. Inbound traffic on port 8443 is open to allow contact to the Amazon EMR web service. Outbound traffic on port 80 and 443 should be allowed so that the cluster can communicate back to the service. Furthermore, inbound and output ephemeral ports should be open in your network ACLs.

For more information about modifying security group rules, see Adding Rules to a Security Group in the Amazon EC2 User Guide for Linux Instances. For more information about connecting to instances in your VPC, see Securely connect to Linux instances running in a private Amazon VPC.

To manage the cluster on a VPC, Amazon EMR attaches a network device to the master node and manages it through this device. You can view this device using the Amazon EC2 API action DescribeInstances. If you modify this device in any way, the cluster may fail.

To launch a cluster into a VPC using the Amazon EMR console

  1. Open the Amazon EMR console at https://console.aws.amazon.com/elasticmapreduce/.

  2. Choose Create cluster.

  3. In the Hardware Configuration section, for Network, select the ID of a VPC network that you created previously.

  4. For EC2 Subnet, select the ID of a subnet that you created previously.

    1. If your private subnet is properly configured with NAT instance and S3 endpoint options, it displays (EMR Ready) above the subnet names and identifiers.

    2. If your private subnet does not have a NAT instance and/or S3 endpoint, you can configure this by choosing Add S3 endpoint and NAT instance, Add S3 endpoint, or Add NAT instance. Select the desired options for your NAT instance and S3 endpoint and choose Configure.

      Important

      In order to create a NAT instance from the Amazon EMR, you need ec2:CreateRoute, ec2:RevokeSecurityGroupEgress, ec2:AuthorizeSecurityGroupEgress, cloudformation:DescribeStackEvents and cloudformation:CreateStack permissions.

      Note

      There is an additional cost for launching an EC2 instance for your NAT device.

  5. Proceed with creating the cluster.

To launch a cluster into a VPC using the AWS CLI

Note

The AWS CLI does not provide a way to create a NAT instance automatically and connect it to your private subnet. However, to create a S3 endpoint in your subnet, you can use the Amazon VPCCLI commands. Use the console to create NAT instances and launch clusters in a private subnet.

After your VPC is configured, you can launch EMR clusters in it by using the create-cluster subcommand with the --ec2-attributes parameter. Use the --ec2-attributes parameter to specify the VPC subnet for your cluster.

  • To create a cluster in a specific subnet, type the following command, replace myKey with the name of your EC2 key pair, and replace 77XXXX03 with your subnet ID.

    • Linux, UNIX, and Mac OS X users:

      Copy
      aws emr create-cluster --name "Test cluster" --ami-version 3.10 --applications Name=Hue Name=Hive Name=Pig \ --use-default-roles --ec2-attributes KeyName=myKey,SubnetId=subnet-77XXXX03 \ --instance-type m3.xlarge --instance-count 3
    • Windows users:

      Copy
      aws emr create-cluster --name "Test cluster" --ami-version 3.10 --applications Name=Hue Name=Hive Name=Pig --use-default-roles --ec2-attributes KeyName=myKey,SubnetId=subnet-77XXXX03 --instance-type m3.xlarge --instance-count 3

    When you specify the instance count without using the --instance-groups parameter, a single master node is launched, and the remaining instances are launched as core nodes. All nodes use the instance type specified in the command.

    Note

    If you have not previously created the default Amazon EMR service role and EC2 instance profile, type aws emr create-default-roles to create them before typing the create-cluster subcommand.

For more information about using Amazon EMR commands in the AWS CLI, see the AWS CLI.