Launch clusters into a VPC
After you have a subnet that is configured to host Amazon EMR clusters, launch the
cluster in that subnet by specifying the associated subnet identifier when
creating the cluster.
Amazon EMR supports private subnets in release versions 4.2 and above.
When the cluster is launched, Amazon EMR adds security groups based on whether the
cluster is launching into VPC private or public subnets. All security groups
allow ingress at port 8443 to communicate to the Amazon EMR service, but IP address
ranges vary for public and private subnets. Amazon EMR manages all of these security
groups, and may need to add additional IP addresses to the AWS range over
time. For more information, see Control network traffic with security groups.
To manage the cluster on a VPC, Amazon EMR attaches a network device to the primary
node and manages it through this device. You can view this device using the
Amazon EC2 API action DescribeInstances
. If you modify this device in
any way, the cluster may fail.
We’ve redesigned the Amazon EMR console to make it easier to use. See What's new with the console? to learn about the differences between the old and new console experiences.
- New console
-
To launch a cluster into a VPC with the new console
-
Sign in to the AWS Management Console, and open the Amazon EMR console at
https://console.aws.amazon.com/emr.
-
Under EMR on EC2 in the left
navigation pane, choose Clusters, and
then choose Create cluster.
-
Under Networking, go to the
Virtual private cloud (VPC) field.
Enter the name of your VPC or choose
Browse to select your VPC.
Alternatively, choose Create VPC to
create a VPC that you can use for your cluster.
-
Choose any other options that apply to your
cluster.
-
To launch your cluster, choose Create
cluster.
- Old console
-
To launch a cluster into a VPC with the old console
Navigate to the new Amazon EMR console and select Switch to the old console from the side navigation. For more information on what to expect when you switch to the old console, see Using the old console.
-
Choose Create cluster.
-
Choose Go to advanced options.
-
In the Hardware Configuration
section, for Network, select the ID of
a VPC network that you created previously.
-
For EC2 Subnet, select the ID of a
subnet that you created previously.
-
If your private subnet is properly configured with
NAT instance and S3 endpoint options, it displays
(EMR Ready) above the subnet
names and identifiers.
-
If your private subnet does not have a NAT
instance and/or S3 endpoint, you can configure this
by choosing Add S3 endpoint and NAT
instance, Add S3
endpoint, or Add NAT
instance. Select the desired options
for your NAT instance and S3 endpoint and choose
Configure.
In order to create a NAT instance from the
Amazon EMR, you need ec2:CreateRoute,
ec2:RevokeSecurityGroupEgress
,
ec2:AuthorizeSecurityGroupEgress
,
cloudformation:DescribeStackEvents
and cloudformation:CreateStack
permissions.
There is an additional cost for launching an
Amazon EC2 instance for your NAT device.
-
Proceed with creating the cluster.
- AWS CLI
-
To launch a cluster into a VPC with the AWS CLI
The AWS CLI does not provide a way to create a NAT instance
automatically and connect it to your private subnet.
However, to create a S3 endpoint in your subnet, you can use
the Amazon VPC CLI commands. Use the console to create NAT
instances and launch clusters in a private
subnet.
After your VPC is configured, you can launch Amazon EMR clusters in
it by using the create-cluster
subcommand with the
--ec2-attributes
parameter. Use the
--ec2-attributes
parameter to specify the VPC
subnet for your cluster.
-
To create a cluster in a specific subnet, type the
following command, replace myKey
with the name of your Amazon EC2 key pair, and replace
77XXXX03
with your subnet
ID.
aws emr create-cluster --name "Test cluster"
--release-label emr-4.2.0
--applications Name=Hadoop
Name=Hive
Name=Pig
--use-default-roles --ec2-attributes KeyName=myKey
,SubnetId=subnet-77XXXX03
--instance-type m5.xlarge
--instance-count 3
When you specify the instance count without using the
--instance-groups
parameter, a single
primary node is launched, and the remaining instances are
launched as core nodes. All nodes use the instance type
specified in the command.
If you have not previously created the default Amazon EMR
service role and EC2 instance profile, type aws
emr create-default-roles
to create them
before typing the create-cluster
subcommand.