Use capacity reservations with instance fleets in Amazon EMR - Amazon EMR

Use capacity reservations with instance fleets in Amazon EMR

To launch On-Demand Instance fleets with capacity reservations options, attach additional service role permissions which are required to use capacity reservation options. Since capacity reservation options must be used together with On-Demand allocation strategy, you also have to include the permissions required for allocation strategy in your service role and managed policy. For more information, see Allocation strategy permissions.

Amazon EMR supports both open and targeted capacity reservations. The following topics show instance fleets configurations that you can use with the RunJobFlow action or create-cluster command to launch instance fleets using On-Demand Capacity Reservations.

Use open capacity reservations on a best-effort basis

If the cluster's On-Demand Instances match the attributes of open capacity reservations (instance type, platform, tenancy and Availability Zone) available in your account, the capacity reservations are applied automatically. However, it is not guaranteed that your capacity reservations will be used. For provisioning the cluster, Amazon EMR evaluates all the instance pools specified in the launch request and uses the one with the lowest price that has sufficient capacity to launch all the requested core nodes. Available open capacity reservations that match the instance pool are applied automatically. If available open capacity reservations do not match the instance pool, they remain unused.

Once the core nodes are provisioned, the Availability Zone is selected and fixed. Amazon EMR provisions task nodes into instance pools, starting with the lowest-priced ones first, in the selected Availability Zone until all the task nodes are provisioned. Available open capacity reservations that match the instance pools are applied automatically.

The following are use cases of Amazon EMR capacity allocation logic for using open capacity reservations on a best-effort basis.

Example 1: Lowest-price instance pool in launch request has available open capacity reservations

In this case, Amazon EMR launches capacity in the lowest-price instance pool with On-Demand Instances. Your available open capacity reservations in that instance pool are used automatically.

On-Demand Strategy lowest-price
Requested Capacity 100
Instance Type c5.xlarge m5.xlarge r5.xlarge
Available Open capacity reservations 150 100 100
On-Demand Price $ $$ $$$
Instances Provisioned 100 - -
Open capacity reservation used 100 - -
Available Open capacity reservations 50 100 100

After the instance fleet is launched, you can run describe-capacity-reservations to see how many unused capacity reservations remain.

Example 2: Lowest-price instance pool in launch request does not have available open capacity reservations

In this case, Amazon EMR launches capacity in the lowest-price instance pool with On-Demand Instances. However, your open capacity reservations remain unused.

On-Demand Strategy lowest-price
Requested Capacity 100
Instance Type c5.xlarge m5.xlarge r5.xlarge

Available Open capacity reservations

- - 100
On-Demand Price $ $$ $$$
Instances Provisioned 100 - -
Open capacity reservation used - - -
Available Open capacity reservations - - 100

Configure Instance Fleets to use open capacity reservations on best-effort basis

When you use the RunJobFlow action to create an instance fleet-based cluster, set the On-Demand allocation strategy to lowest-price and CapacityReservationPreference for capacity reservations options to open. Alternatively, if you leave this field blank, Amazon EMR defaults the On-Demand Instance's capacity reservation preference to open.

"LaunchSpecifications": {"OnDemandSpecification": { "AllocationStrategy": "lowest-price", "CapacityReservationOptions": { "CapacityReservationPreference": "open" } } }

You can also use the Amazon EMR CLI to create an instance fleet-based cluster using open capacity reservations.

aws emr create-cluster \ --name 'open-ODCR-cluster' \ --release-label emr-5.30.0 \ --service-role EMR_DefaultRole \ --ec2-attributes SubnetId=subnet-22XXXX01,InstanceProfile=EMR_EC2_DefaultRole \ --instance-fleets InstanceFleetType=MASTER,TargetOnDemandCapacity=1,InstanceTypeConfigs=['{InstanceType=c4.xlarge}'] \ InstanceFleetType=CORE,TargetOnDemandCapacity=100,InstanceTypeConfigs=['{InstanceType=c5.xlarge},{InstanceType=m5.xlarge},{InstanceType=r5.xlarge}'],\ LaunchSpecifications={OnDemandSpecification='{AllocationStrategy=lowest-price,CapacityReservationOptions={CapacityReservationPreference=open}}'}

Where,

  • open-ODCR-cluster is replaced with the name of the cluster using open capacity reservations.

  • subnet-22XXXX01 is replaced with the subnet ID.

Use open capacity reservations first

You can choose to override the lowest-price allocation strategy and prioritize using available open capacity reservations first while provisioning an Amazon EMR cluster. In this case, Amazon EMR evaluates all the instance pools with capacity reservations specified in the launch request and uses the one with the lowest price that has sufficient capacity to launch all the requested core nodes. If none of the instance pools with capacity reservations have sufficient capacity for the requested core nodes, Amazon EMR falls back to the best-effort case described in the previous topic. That is, Amazon EMR re-evaluates all the instance pools specified in the launch request and uses the one with the lowest price that has sufficient capacity to launch all the requested core nodes. Available open capacity reservations that match the instance pool are applied automatically. If available open capacity reservations do not match the instance pool, they remain unused.

Once the core nodes are provisioned, the Availability Zone is selected and fixed. Amazon EMR provisions task nodes into instance pools with capacity reservations, starting with the lowest-priced ones first, in the selected Availability Zone until all the task nodes are provisioned. Amazon EMR uses the available open capacity reservations available across each instance pool in the selected Availability Zone first, and only if required, uses the lowest-price strategy to provision any remaining task nodes.

The following are use cases of Amazon EMR capacity allocation logic for using open capacity reservations first.

Example 1: Instance pool with available open capacity reservations in launch request has sufficient capacity for core nodes

In this case, Amazon EMR launches capacity in the instance pool with available open capacity reservations regardless of instance pool price. As a result, your open capacity reservations are used whenever possible, until all core nodes are provisioned.

On-Demand Strategy lowest-price
Requested Capacity 100
Usage Strategy use-capacity-reservations-first
Instance Type c5.xlarge m5.xlarge r5.xlarge
Available Open capacity reservations - - 150
On-Demand Price $ $$ $$$
Instances Provisioned - - 100
Open capacity reservation used - - 100
Available Open capacity reservations - - 50

Example 2: Instance pool with available open capacity reservations in launch request does not have sufficient capacity for core nodes

In this case, Amazon EMR falls back to launching core nodes using lowest-price strategy with a best-effort to use capacity reservations.

On-Demand Strategy lowest-price
Requested Capacity 100
Usage Strategy use-capacity-reservations-first
Instance Type c5.xlarge m5.xlarge r5.xlarge
Available Open capacity reservations 10 50 50
On-Demand Price $ $$ $$$
Instances Provisioned 100 - -
Open capacity reservation used 10 - -
Available open capacity reservations - 50 50

After the instance fleet is launched, you can run describe-capacity-reservations to see how many unused capacity reservations remain.

Configure Instance Fleets to use open capacity reservations first

When you use the RunJobFlow action to create an instance fleet-based cluster, set the On-Demand allocation strategy to lowest-price and UsageStrategy for CapacityReservationOptions to use-capacity-reservations-first.

"LaunchSpecifications": {"OnDemandSpecification": { "AllocationStrategy": "lowest-price", "CapacityReservationOptions": { "UsageStrategy": "use-capacity-reservations-first" } } }

You can also use the Amazon EMR CLI to create an instance-fleet based cluster using capacity reservations first.

aws emr create-cluster \ --name 'use-CR-first-cluster' \ --release-label emr-5.30.0 \ --service-role EMR_DefaultRole \ --ec2-attributes SubnetId=subnet-22XXXX01,InstanceProfile=EMR_EC2_DefaultRole \ --instance-fleets \ InstanceFleetType=MASTER,TargetOnDemandCapacity=1,InstanceTypeConfigs=['{InstanceType=c4.xlarge}'] \ InstanceFleetType=CORE,TargetOnDemandCapacity=100,InstanceTypeConfigs=['{InstanceType=c5.xlarge},{InstanceType=m5.xlarge},{InstanceType=r5.xlarge}'],\ LaunchSpecifications={OnDemandSpecification='{AllocationStrategy=lowest-price,CapacityReservationOptions={UsageStrategy=use-capacity-reservations-first}}'}

Where,

  • use-CR-first-cluster is replaced with the name of the cluster using open capacity reservations.

  • subnet-22XXXX01 is replaced with the subnet ID.

Use targeted capacity reservations first

When you provision an Amazon EMR cluster, you can choose to override the lowest-price allocation strategy and prioritize using available targeted capacity reservations first. In this case, Amazon EMR evaluates all the instance pools with targeted capacity reservations specified in the launch request and picks the one with the lowest price that has sufficient capacity to launch all the requested core nodes. If none of the instance pools with targeted capacity reservations have sufficient capacity for core nodes, Amazon EMR falls back to the best-effort case described earlier. That is, Amazon EMR re-evaluates all the instance pools specified in the launch request and selects the one with the lowest price that has sufficient capacity to launch all the requested core nodes. Available open capacity reservations which match the instance pool get applied automatically. However, targeted capacity reservations remain unused.

Once the core nodes are provisioned, the Availability Zone is selected and fixed. Amazon EMR provisions task nodes into instance pools with targeted capacity reservations, starting with the lowest-priced ones first, in the selected Availability Zone until all the task nodes are provisioned. Amazon EMR tries to use the available targeted capacity reservations available across each instance pool in the selected Availability Zone first. Then, only if required, Amazon EMR uses the lowest-price strategy to provision any remaining task nodes.

The following are use cases of Amazon EMR capacity allocation logic for using targeted capacity reservations first.

Example 1: Instance pool with available targeted capacity reservations in launch request has sufficient capacity for core nodes

In this case, Amazon EMR launches capacity in the instance pool with available targeted capacity reservations regardless of instance pool price. As a result, your targeted capacity reservations are used whenever possible until all core nodes are provisioned.

On-Demand Strategy lowest-price
Usage Strategy use-capacity-reservations-first
Requested Capacity 100
Instance Type c5.xlarge m5.xlarge r5.xlarge
Available targeted capacity reservations - - 150
On-Demand Price $ $$ $$$
Instances Provisioned - - 100
Targeted capacity reservation used - - 100
Available targeted capacity reservations - - 50
Example 2: Instance pool with available targeted capacity reservations in launch request does not have sufficient capacity for core nodes
On-Demand Strategy lowest-price
Requested Capacity 100
Usage Strategy use-capacity-reservations-first
Instance Type c5.xlarge m5.xlarge r5.xlarge
Available targeted capacity reservations 10 50 50
On-Demand Price $ $$ $$$
Instances Provisioned 100 - -
Targeted capacity reservations used 10 - -
Available targeted capacity reservations - 50 50

After the instance fleet is launched, you can run describe-capacity-reservations to see how many unused capacity reservations remain.

Configure Instance Fleets to use targeted capacity reservations first

When you use the RunJobFlow action to create an instance-fleet based cluster, set the On-Demand allocation strategy to lowest-price, UsageStrategy for CapacityReservationOptions to use-capacity-reservations-first, and CapacityReservationResourceGroupArn for CapacityReservationOptions to <your resource group ARN>. For more information, see Work with capacity reservations in the Amazon EC2 User Guide.

"LaunchSpecifications": {"OnDemandSpecification": { "AllocationStrategy": "lowest-price", "CapacityReservationOptions": { "UsageStrategy": "use-capacity-reservations-first", "CapacityReservationResourceGroupArn": "arn:aws:resource-groups:sa-east-1:123456789012:group/MyCRGroup" } } }

Where arn:aws:resource-groups:sa-east-1:123456789012:group/MyCRGroup is replaced with your resource group ARN.

You can also use the Amazon EMR CLI to create an instance fleet-based cluster using targeted capacity reservations.

aws emr create-cluster \ --name 'targeted-CR-cluster' \ --release-label emr-5.30.0 \ --service-role EMR_DefaultRole \ --ec2-attributes SubnetId=subnet-22XXXX01,InstanceProfile=EMR_EC2_DefaultRole \ --instance-fleets InstanceFleetType=MASTER,TargetOnDemandCapacity=1,InstanceTypeConfigs=['{InstanceType=c4.xlarge}'] \ InstanceFleetType=CORE,TargetOnDemandCapacity=100,\ InstanceTypeConfigs=['{InstanceType=c5.xlarge},{InstanceType=m5.xlarge},{InstanceType=r5.xlarge}'],\ LaunchSpecifications={OnDemandSpecification='{AllocationStrategy=lowest-price,CapacityReservationOptions={UsageStrategy=use-capacity-reservations-first,CapacityReservationResourceGroupArn=arn:aws:resource-groups:sa-east-1:123456789012:group/MyCRGroup}}'}

Where,

  • targeted-CR-cluster is replaced with the name of your cluster using targeted capacity reservations.

  • subnet-22XXXX01 is replaced with the subnet ID.

  • arn:aws:resource-groups:sa-east-1:123456789012:group/MyCRGroup is replaced with your resource group ARN.

Avoid using available open capacity reservations

If you want to avoid unexpectedly using any of your open capacity reservations when launching an Amazon EMR cluster, set the On-Demand allocation strategy to lowest-price and CapacityReservationPreference for CapacityReservationOptions to none. Otherwise, Amazon EMR defaults the On-Demand Instance's capacity reservation preference to open and tries using available open capacity reservations on a best-effort basis.

"LaunchSpecifications": {"OnDemandSpecification": { "AllocationStrategy": "lowest-price", "CapacityReservationOptions": { "CapacityReservationPreference": "none" } } }

You can also use the Amazon EMR CLI to create an instance fleet-based cluster without using any open capacity reservations.

aws emr create-cluster \ --name 'none-CR-cluster' \ --release-label emr-5.30.0 \ --service-role EMR_DefaultRole \ --ec2-attributes SubnetId=subnet-22XXXX01,InstanceProfile=EMR_EC2_DefaultRole \ --instance-fleets \ InstanceFleetType=MASTER,TargetOnDemandCapacity=1,InstanceTypeConfigs=['{InstanceType=c4.xlarge}'] \ InstanceFleetType=CORE,TargetOnDemandCapacity=100,InstanceTypeConfigs=['{InstanceType=c5.xlarge},{InstanceType=m5.xlarge},{InstanceType=r5.xlarge}'],\ LaunchSpecifications={OnDemandSpecification='{AllocationStrategy=lowest-price,CapacityReservationOptions={CapacityReservationPreference=none}}'}

Where,

  • none-CR-cluster is replaced with the name of your cluster that is not using any open capacity reservations.

  • subnet-22XXXX01 is replaced with the subnet ID.

Scenarios for using capacity reservations

You can benefit from using capacity reservations in the following scenarios.

Scenario 1: Rotate a long-running cluster using capacity reservations

When rotating a long running cluster, you might have strict requirements on the instance types and Availability Zones for the new instances you provision. With capacity reservations, you can use capacity assurance to complete the cluster rotation without interruptions.

Cluster rotation using available capacity reservations
Scenario 2: Provision successive short-lived clusters using capacity reservations

You can also use capacity reservations to provision a group of successive, short-lived clusters for individual workloads so that when you terminate a cluster, the next cluster can use the capacity reservations. You can use targeted capacity reservations to ensure that only the intended clusters use the capacity reservations.

Short-lived cluster provisioning that uses available capacity reservations