Plan and configure primary nodes - Amazon EMR

Plan and configure primary nodes

When you launch an Amazon EMR cluster, you can choose to have one or three primary nodes in your cluster. Launching a cluster with three primary nodes is only supported by Amazon EMR version 5.23.0 and later. Amazon EMR can take advantage of EC2 placement groups to ensure primary nodes are placed on distinct underlying hardware to further improve cluster availability. For more information, see Amazon EMR integration with EC2 placement groups.

An Amazon EMR cluster with multiple primary nodes provides the following key benefits:

  • The primary node is no longer a single point of failure. If one of the primary nodes fails, the cluster uses the other two primary nodes and runs without interruption. In the meantime, Amazon EMR automatically replaces the failed primary node with a new one that is provisioned with the same configuration and bootstrap actions.

  • Amazon EMR enables the Hadoop high availability features of HDFS NameNode and YARN ResourceManager and supports high availability for a few other open source applications.

    For more information about how an Amazon EMR cluster with multiple primary nodes supports open source applications and other Amazon EMR features, see Supported applications and features.

Note

The cluster can reside only in one Availability Zone or subnet.

This section provides information about supported applications and features of an Amazon EMR cluster with multiple primary nodes as well as the configuration details, best practices, and considerations for launching the cluster.