Menu
Amazon EMR
Developer Guide

Amazon VPC Options

When launching an EMR cluster within a VPC, you can launch it within either a public or private subnet. There are slight, notable differences in configuration, depending on the subnet type you choose for a cluster.

Public Subnets

EMR clusters in a public subnet require a connected Internet gateway. This is because Amazon EMR clusters must access AWS services and Amazon EMR. If a service, such as Amazon S3, provides the ability to create a VPC endpoint, you can access those services using the endpoint instead of accessing a public endpoint through an Internet gateway. Additionally, Amazon EMR cannot communicate with clusters in public subnets through a network address translation (NAT) device. An Internet gateway is required for this purpose but you can still use a NAT instance or gateway for other traffic in more complex scenarios.

If you have additional AWS resources that you do not want connected to the Internet gateway, you can launch those components in a private subnet that you create within your VPC.

Clusters running in a public subnet use two security groups, ElasticMapReduce-master and ElasticMapReduce-slave, which control access to the master and slave instance groups, respectively.

Public Subnet Security Groups

Security Group Name Description Open Inbound Ports Open Outbound Ports
ElasticMapReduce-master Security group for master instance groups of clusters in a public subnet. TCP

0-65535

8443

22

UDP

0-65535

All
ElasticMapReduce-slave Security group for slave instance groups (containing core and task nodes) of clusters in a public subnet. TCP

0-65535

UDP

0-65535

All

The master instance group contains the master node while a slave group contains both task and core nodes of the cluster. All instances in a cluster connect to Amazon S3 through either a VPC endpoint or Internet gateway. Other AWS services which do not currently support VPC endpoints use only an Internet gateway.

The following diagram shows how an Amazon EMR cluster runs in a VPC using a public subnet. The cluster is able to connect to other AWS resources, such as Amazon S3 buckets, through the Internet gateway.


						Cluster on a VPC

The following diagram shows how to set up a VPC so that a cluster in the VPC can access resources in your own network, such as an Oracle database.


						Set up a VPC and cluster to access local VPN resources

On this page: