Configure Cluster Hardware and Networking - Amazon EMR

Configure Cluster Hardware and Networking

An important consideration when you create an EMR cluster is how you configure Amazon EC2 instances and network options. This chapter covers the following options, and then ties them all together with best practices and guidelines.

  • Node types – EC2 instances in an EMR cluster are organized into node types. There are three: master nodes, core nodes, and task nodes. Each node type performs a set of roles defined by the distributed applications that you install on the cluster. During a Hadoop MapReduce or Spark job, for example, components on core and task nodes process data, transfer output to Amazon S3 or HDFS, and provide status metadata back to the master node. With a single-node cluster, all components run on the master node. For more information, see Understand Node Types: Master, Core, and Task Nodes.

  • EC2 instances – When you create a cluster, you make choices about the EC2 instances that each type of node will run on. The EC2 instance type determines the processing and storage profile of the node. The choice of EC2 instance for your nodes is important because it determines the performance profile of individual node types in your cluster. For more information, see Configure EC2 Instances.

  • Networking – You can launch your EMR cluster into a VPC using a public subnet, private subnet, or a shared subnet. Your networking configuration determines how customers and services can connect to clusters to perform work, how clusters connect to data stores and other AWS resources, and the options you have for controlling traffic on those connections. For more information, see Configure Networking.

  • Instance grouping – The collection of EC2 instances that host each node type is called either an instance fleet or a uniform instance group. The instance grouping configuration is a choice you make when you create a cluster. This choice determines how you can add nodes to your cluster while it is running. The configuration applies to all node types. It can't be changed later. For more information, see Create a Cluster with Instance Fleets or Uniform Instance Groups.

    Note

    The instance fleets configuration is available only in Amazon EMR release versions 4.8.0 and later, excluding 5.0.0 and 5.0.3.