Amazon EMR integration with EC2 placement groups
When you launch an Amazon EMR multiple primary node cluster on Amazon EC2, you have the option to use placement group strategies to specify how you want the primary node instances deployed to protect against hardware failure.
Placement group strategies are supported starting with Amazon EMR version 5.23.0 as an
option for multiple primary node clusters. Currently, only primary node types
are supported by the placement group strategy, and the SPREAD
strategy is
applied to those primary nodes. The SPREAD
strategy places a small
group of instances across separate underlying hardware to guard against the loss of
multiple primary nodes in the event of a hardware failure. Note that an instance
launch request could fail if there is insufficient unique hardware to fulfill the
request. For more information about EC2 placement strategies and limitations, see Placement
groups in the EC2 User Guide for Linux
Instances.
There is an initial limit from Amazon EC2 of 500 placement group strategy-enabled clusters that can be launched per AWS region. Contact AWS support to request an increase in the number of allowed placement groups. You can identify EC2 placement groups Amazon EMR creates by tracking the key-value pair that Amazon EMR associates with the Amazon EMR placement group strategy. For more information about EC2 cluster instance tags, see View cluster instances in Amazon EC2.
Attaching the placement group managed policy to the Amazon EMRrole
The placement group strategy requires a managed policy called
AmazonElasticMapReducePlacementGroupPolicy
, which allows Amazon EMR to
create, delete, and describe placement groups on Amazon EC2. You must attach
AmazonElasticMapReducePlacementGroupPolicy
to the service role for
Amazon EMR before you launch an Amazon EMR multiple master cluster.
You can alternatively attach the AmazonEMRServicePolicy_v2
managed
policy to the Amazon EMR service role instead of the placement group managed policy.
AmazonEMRServicePolicy_v2
allows the same access to placement
groups on Amazon EC2 as the AmazonElasticMapReducePlacementGroupPolicy
. For
more information, see Service role for Amazon EMR (EMR role).
The AmazonElasticMapReducePlacementGroupPolicy
managed policy is the
following JSON text that is created and administered by Amazon EMR.
Because the AmazonElasticMapReducePlacementGroupPolicy
managed
policy is updated automatically, the policy shown here may be out-of-date. Use
the AWS Management Console to view the current policy.
{ "Version":"2012-10-17", "Statement":[ { "Resource":"*", "Effect":"Allow", "Action":[ "ec2:DeletePlacementGroup", "ec2:DescribePlacementGroups" ] }, { "Resource":"arn:aws:ec2:*:*:placement-group/pg-*", "Effect":"Allow", "Action":[ "ec2:CreatePlacementGroup" ] } ] }
Launch an Amazon EMR multiple master cluster with placement group strategy
To launch an Amazon EMR multiple master cluster with a placement group strategy, attach
the placement group managed policy
AmazonElasticMapReducePlacementGroupPolicy
to the Amazon EMR role. For
more information, see Attaching the placement group managed
policy to the Amazon EMRrole.
Every time you use this role to start an Amazon EMR multiple master cluster, Amazon EMR
attempts to launch a cluster with SPREAD
strategy applied to its
primary nodes. If you use a role that does not have the placement group managed
policy AmazonElasticMapReducePlacementGroupPolicy
attached to it, Amazon EMR
attempts to launch an Amazon EMR multiple master cluster without a placement group
strategy.
If you launch an Amazon EMR multiple master cluster with the
placement-group-configs
parameter using the Amazon EMRAPI or CLI, Amazon EMR
only launches the cluster if the Amazon EMRrole has the placement group managed policy
AmazonElasticMapReducePlacementGroupPolicy
attached. If the
Amazon EMRrole does not have the policy attached, the Amazon EMR multiple master cluster start
fails.
Example – Launching an Amazon EMR multiple master cluster with placement group strategy using the Amazon EMRAPI.
When you use the RunJobFlow action to create an Amazon EMR multiple master cluster,
set the PlacementGroupConfigs
property to the following. Currently,
the MASTER
instance role automatically uses SPREAD
as
the placement group strategy.
{ "Name":"ha-cluster", "PlacementGroupConfigs":[ { "InstanceRole":"MASTER" } ], "ReleaseLabel":"emr-5.30.1", "Instances":{ "ec2SubnetId":"subnet-22XXXX01", "ec2KeyName":"ec2_key_pair_name", "InstanceGroups":[ { "InstanceCount":3, "InstanceRole":"MASTER", "InstanceType":"m5.xlarge" }, { "InstanceCount":4, "InstanceRole":"CORE", "InstanceType":"m5.xlarge" } ] }, "JobFlowRole":"EMR_EC2_DefaultRole", "ServiceRole":"EMR_DefaultRole" }
-
Replace
ha-cluster
with the name of your high-availability cluster. -
Replace
subnet-22XXXX01
with your subnet ID. -
Replace the
ec2_key_pair_name
with the name of your EC2 key pair for this cluster. EC2 key pair is optional and only required if you want to use SSH to access your cluster.
Example – Launching a cluster with multiple primary nodes with a placement group strategy using the Amazon EMR CLI.
aws emr create-cluster \ --name "ha-cluster" \ --placement-group-configs InstanceRole=MASTER \ --release-label emr-5.30.1 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=3,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=4,InstanceType=m5.xlarge \ --ec2-attributes KeyName=ec2_key_pair_name,InstanceProfile=EMR_EC2_DefaultRole,SubnetId=subnet-22XXXX01 \ --service-role EMR_DefaultRole \ --applications Name=Hadoop Name=Spark
-
Replace
ha-cluster
with the name of your high-availability cluster. -
Replace
subnet-22XXXX01
with your subnet ID. -
Replace the
ec2_key_pair_name
with the name of your EC2 key pair for this cluster. EC2 key pair is optional and only required if you want to use SSH to access your cluster.
Launch a cluster with multiple primary nodes without a placement group strategy
For a cluster with multiple primary nodes to launch primary nodes without the placement group strategy, you need to do one of the following:
-
Remove the placement group managed policy
AmazonElasticMapReducePlacementGroupPolicy
from the Amazon EMRrole, or -
Launch a cluster with multiple primary nodes with the
placement-group-configs
parameter using the Amazon EMRAPI or CLI choosingNONE
as the placement group strategy.
Example – Launching a cluster with multiple primary nodes without placement group strategy using the Amazon EMRAPI.
When using the RunJobFlow action to create a cluster with multiple primary nodes, set the
PlacementGroupConfigs
property to the following.
{ "Name":"ha-cluster", "PlacementGroupConfigs":[ { "InstanceRole":"MASTER", "PlacementStrategy":"NONE" } ], "ReleaseLabel":"emr-5.30.1", "Instances":{ "ec2SubnetId":"subnet-22XXXX01", "ec2KeyName":"ec2_key_pair_name", "InstanceGroups":[ { "InstanceCount":3, "InstanceRole":"MASTER", "InstanceType":"m5.xlarge" }, { "InstanceCount":4, "InstanceRole":"CORE", "InstanceType":"m5.xlarge" } ] }, "JobFlowRole":"EMR_EC2_DefaultRole", "ServiceRole":"EMR_DefaultRole" }
-
Replace
ha-cluster
with the name of your high-availability cluster. -
Replace
subnet-22XXXX01
with your subnet ID. -
Replace the
ec2_key_pair_name
with the name of your EC2 key pair for this cluster. EC2 key pair is optional and only required if you want to use SSH to access your cluster.
Example – Launching a cluster with multiple primary nodes without a placement group strategy using the Amazon EMRCLI.
aws emr create-cluster \ --name "ha-cluster" \ --placement-group-configs InstanceRole=MASTER,PlacementStrategy=NONE \ --release-label emr-5.30.1 \ --instance-groups InstanceGroupType=MASTER,InstanceCount=3,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=4,InstanceType=m5.xlarge \ --ec2-attributes KeyName=ec2_key_pair_name,InstanceProfile=EMR_EC2_DefaultRole,SubnetId=subnet-22XXXX01 \ --service-role EMR_DefaultRole \ --applications Name=Hadoop Name=Spark
-
Replace
ha-cluster
with the name of your high-availability cluster. -
Replace
subnet-22XXXX01
with your subnet ID. -
Replace the
ec2_key_pair_name
with the name of your EC2 key pair for this cluster. EC2 key pair is optional and only required if you want to use SSH to access your cluster.
Checking placement group strategy configuration attached to the cluster with multiple primary nodes
You can use the Amazon EMR describe cluster API to see the placement group strategy configuration attached to the cluster with multiple primary nodes.
aws emr describe-cluster --cluster-id "j-xxxxx" { "Cluster":{ "Id":"j-xxxxx", ... ... "PlacementGroups":[ { "InstanceRole":"MASTER", "PlacementStrategy":"SPREAD" } ] } }