Use Capacity Rebalancing to handle Amazon EC2 Spot interruptions - Amazon EC2 Auto Scaling

Use Capacity Rebalancing to handle Amazon EC2 Spot interruptions

You can configure Amazon EC2 Auto Scaling to monitor and automatically respond to changes that affect the availability of your Spot Instances. Capacity Rebalancing helps you maintain workload availability by proactively augmenting your fleet with a new Spot Instance before a running instance is interrupted by Amazon EC2.

The goal of Capacity Rebalancing is to keep processing your workload without interruption. When Spot Instances are at an elevated risk of interruption, the Amazon EC2 Spot service notifies Amazon EC2 Auto Scaling with an EC2 instance rebalance recommendation.

When you enable Capacity Rebalancing for your Auto Scaling group, Amazon EC2 Auto Scaling attempts to proactively replace the Spot Instances in your group that have received a rebalance recommendation. This provides an opportunity to rebalance your workload to new Spot Instances that aren't at an elevated risk of interruption. Your workload can continue to process the work while Amazon EC2 Auto Scaling launches new Spot Instances before your existing instances are interrupted.

When you don't use Capacity Rebalancing, Amazon EC2 Auto Scaling doesn't replace Spot Instances until after the Amazon EC2 Spot service interrupts the instances and their health check fails. Before interrupting an instance, Amazon EC2 always gives both an EC2 instance rebalance recommendation and a Spot two-minute instance interruption notice.

Overview

To use Capacity Rebalancing with your Auto Scaling group, the basic steps are:

  1. Configure your Auto Scaling group to use multiple instance types and Availability Zones. This way, Amazon EC2 Auto Scaling can look at the available capacity for Spot Instances in each Availability Zone. For more information, see Auto Scaling groups with multiple instance types and purchase options.

  2. Add lifecycle hooks as needed to perform a graceful shutdown of your application inside the instances that receive the rebalance notification. For more information, see Amazon EC2 Auto Scaling lifecycle hooks.

    The following are some reasons why you might use a lifecycle hook:

    • For graceful shutdown of Amazon SQS workers

    • To complete deregistration from the Domain Name System (DNS)

    • To pull system or application logs and upload them to Amazon Simple Storage Service (Amazon S3)

  3. Develop a custom action for the lifecycle hook. To invoke your custom action as soon as possible, you need to know when an instance is ready to be terminated. Find this out by detecting the lifecycle state of the instance.

    • To invoke an action outside of the instance, write an EventBridge rule and automate what action to take when an event pattern matches the rule.

    • To invoke an action inside of the instance, configure the instance to run a shutdown script and retrieve the lifecycle state through instance metadata.

    It's critical to design the custom action to finish in under two minutes. This makes sure there's enough time to complete tasks before instance termination.

After you complete these steps, you can begin using Capacity Rebalancing.

Capacity Rebalancing behavior

With Capacity Rebalancing, Amazon EC2 Auto Scaling behaves in the following way when an instance receives a rebalance recommendation:

  • When the new Spot Instance launches, Amazon EC2 Auto Scaling waits until the new instance passes its health check before it terminates the previous instance. When replacing more than one instance, the termination of each previous instance starts after the new instance has launched and passed its health check.

  • Because Amazon EC2 Auto Scaling attempts to launch new instances before terminating previous ones, being at or near the specified maximum capacity could impede or completely halt rebalancing activities. To avoid this problem, Amazon EC2 Auto Scaling can temporarily exceed the group's maximum size by up to 10 percent of the desired capacity.

  • If you didn't add a lifecycle hook to your Auto Scaling group, Amazon EC2 Auto Scaling starts terminating the previous instances as soon as the new instances pass their health check.

  • If you added a lifecycle hook, this extends the amount of time it takes before we start terminating the previous instances by the timeout value you specified for the lifecycle hook.

  • If you are using scaling policies or scheduled scaling, the scaling activities run in parallel. If a scaling activity is in progress and your Auto Scaling group is below its new desired capacity, Amazon EC2 Auto Scaling scales out first before terminating the previous instances.

If there is no capacity for your instance types in one Availability Zone, Amazon EC2 Auto Scaling keeps trying to launch Spot Instances in other enabled Availability Zones until it succeeds.

In the worst case scenario, if new instances fail to launch or their health checks fail, Amazon EC2 Auto Scaling keeps trying to relaunch them. While it's trying to launch new instances, your previous ones will eventually be interrupted and forcibly terminated with a two-minute interruption notice.

Considerations

Consider the following when using Capacity Rebalancing:

Design your application to be tolerant to Spot interruptions

Your application should be able to handle dynamic changes in the number of instances and the possibility of a Spot Instance being interrupted early. For example, if your Auto Scaling group is behind an Elastic Load Balancing load balancer, Amazon EC2 Auto Scaling waits for the instance to deregister from the load balancer before calling your lifecycle hook. If the time to deregister the instance and complete the lifecycle action takes too long, the instance might be interrupted while Amazon EC2 Auto Scaling waits for your lifecycle action to complete before terminating the instance.

It's not always possible for Amazon EC2 to send the rebalance recommendation signal before the two-minute Spot Instance interruption notice. Sometimes, the rebalance recommendation signal arrives at the same time as the two-minute interruption notice. When this happens, Amazon EC2 Auto Scaling calls the lifecycle hook and attempts to launch a new Spot Instance immediately.

Avoid an elevated risk of interruption of replacement Spot Instances

Your replacement Spot Instances might be at an elevated risk of interruption if you use the lowest-price allocation strategy. This is because we launch instances in the lowest priced pool that has available capacity at that moment, even if your replacement Spot Instances are likely to be interrupted soon after they launch. To avoid an elevated risk of interruption, we strongly recommend that you do not use the lowest-price allocation strategy. Instead, we recommend the price-capacity-optimized allocation strategy. This strategy launches replacement Spot Instances in Spot pools that are least likely to be interrupted and have the lowest possible price. Therefore, they're less likely to be interrupted in the near future.

Amazon EC2 Auto Scaling will only launch a new instance if availability is the same or better

One of the goals of Capacity Rebalancing is to improve a Spot Instance's availability. If an existing Spot Instance receives a rebalance recommendation, Amazon EC2 Auto Scaling will only launch a new instance if the new instance provides the same or better availability than the existing instance. If the risk of interruption of a new instance will be worse than the existing instance, then Amazon EC2 Auto Scaling will not launch a new instance. Amazon EC2 Auto Scaling will, however, continue to assess the Spot capacity pools based on information provided by the Amazon EC2 Spot service, and will launch a new instance if availability improves.

There is a chance that your existing instance will be interrupted without Amazon EC2 Auto Scaling proactively launching a new instance. When this happens, Amazon EC2 Auto Scaling attempts to launch a new instance as soon as it receives the Spot Instance interruption notice. This happens regardless of whether the new instance has a high risk of interruption.

Capacity Rebalancing does not increase your Spot Instance interruption rate

When you enable Capacity Rebalancing, it does not increase your Spot Instance interruption rate (the number of Spot Instances that are reclaimed when Amazon EC2 needs the capacity back). However, if Capacity Rebalancing detects an instance is at risk of interruption, Amazon EC2 Auto Scaling will immediately attempt to launch a new instance. Therefore, more instances might be replaced than if you waited for Amazon EC2 Auto Scaling to launch a new instance after the at-risk instance was interrupted.

While you might replace more instances with Capacity Rebalancing enabled, you benefit from being proactive rather than reactive. This gives you more time to take action before your instances are interrupted. With a Spot Instance interruption notice, you typically only have up to two minutes to gracefully shut down your instance. With Capacity Rebalancing launching a new instance in advance, you give existing processes a better chance of completing on your at-risk instance. You can also start your instance shutdown procedures, prevent new work from being scheduled on your at-risk instance, and prepare the newly launched instance to take over the application. With proactive replacement in Capacity Rebalancing, you benefit from graceful continuity.

The following theoretical example demonstrates the risks and benefits of using Capacity Rebalancing:

  • 2:00 PM – A rebalance recommendation is received for instance A. Amazon EC2 Auto Scaling immediately attempts to launch replacement instance B, giving you time to start your shutdown procedures.

  • 2:30 PM – A rebalance recommendation is received for instance B, which is replaced with instance C. This gives you time to start your shutdown procedures.

  • 2:32 PM – If Capacity Rebalancing isn’t enabled, and if a Spot Instance interruption notice would've been received at 2:32 PM for instance A, you would have had only two minutes to take action. However, Instance A would have continued running until this time.

Enable Capacity Rebalancing (console)

You can enable or disable Capacity Rebalancing when you create or update an Auto Scaling group.

To enable Capacity Rebalancing for a new Auto Scaling group
  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/, and choose Auto Scaling Groups from the navigation pane.

  2. Choose Create Auto Scaling group.

  3. For Step 1: Choose launch template or configuration, enter a name for the Auto Scaling group, choose a launch template, and then choose Next to proceed to the next step.

  4. For Step 2: Choose instance launch options, for Instance type requirements, choose settings to create a mixed instances group. This includes the instance types that it can launch, instance purchase options, and allocation strategies for Spot and On-Demand Instances. By default, these settings are not configured. To configure them, you must select Override launch template. For more information about creating a mixed instances group, see Auto Scaling groups with multiple instance types and purchase options.

  5. Under Network, choose the options as desired. Verify that the subnets you want to use are in different Availability Zones.

  6. Under the Allocation strategies section, choose a Spot allocation strategy. Enable or disable Capacity Rebalancing by selecting or clearing the check box under Capacity Rebalancing. You only see this option when you request a percentage of the Auto Scaling group to be launched as Spot Instances in the Instance purchase options section.

  7. Create the Auto Scaling group.

  8. (Optional) Add lifecycle hooks as needed. For more information, see Add lifecycle hooks.

To enable or disable Capacity Rebalancing for an existing Auto Scaling group
  1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/, and choose Auto Scaling Groups from the navigation pane.

  2. Select the check box next to your Auto Scaling group. A split pane opens in the bottom of the page.

  3. On the Details tab, choose Allocation strategies, Edit.

  4. Under the Allocation strategies section, enable or disable Capacity Rebalancing by selecting or clearing the check box under Capacity Rebalancing.

  5. Choose Update.

Enable Capacity Rebalancing (AWS CLI)

The following examples show how to use the AWS CLI to enable and disable Capacity Rebalancing.

Use the create-auto-scaling-group or update-auto-scaling-group command with the following parameter:

  • --capacity-rebalance / --no-capacity-rebalance – Boolean value that indicates whether Capacity Rebalancing is enabled.

Before you call the create-auto-scaling-group command, you need the name of a launch template that is configured for use with an Auto Scaling group. For more information, see Create a launch template for an Auto Scaling group.

Note

The following procedures show how to use a configuration file formatted in JSON or YAML. If you use AWS CLI version 1, you must specify a JSON-formatted configuration file. If you use AWS CLI version 2, you can specify a configuration file formatted in either YAML or JSON.

To create and configure a new Auto Scaling group
  • Use the following create-auto-scaling-group command to create a new Auto Scaling group and enable Capacity Rebalancing. This command references a JSON file as the sole parameter for your Auto Scaling group.

    aws autoscaling create-auto-scaling-group --cli-input-json file://~/config.json

    If you don't already have a CLI configuration file that specifies a mixed instances policy, create one.

    Add the following line to the top-level JSON object in the configuration file.

    { "CapacityRebalance": true }

    The following is an example config.json file.

    { "AutoScalingGroupName": "my-asg", "DesiredCapacity": 12, "MinSize": 12, "MaxSize": 15, "CapacityRebalance": true, "MixedInstancesPolicy": { "InstancesDistribution": { "OnDemandBaseCapacity": 0, "OnDemandPercentageAboveBaseCapacity": 25, "SpotAllocationStrategy": "price-capacity-optimized" }, "LaunchTemplate": { "LaunchTemplateSpecification": { "LaunchTemplateName": "my-launch-template", "Version": "$Default" }, "Overrides": [ { "InstanceType": "c5.large" }, { "InstanceType": "c5a.large" }, { "InstanceType": "m5.large" }, { "InstanceType": "m5a.large" }, { "InstanceType": "c4.large" }, { "InstanceType": "m4.large" }, { "InstanceType": "c3.large" }, { "InstanceType": "m3.large" } ] } }, "TargetGroupARNs": "arn:aws:elasticloadbalancing:us-west-2:123456789012:targetgroup/my-alb-target-group/943f017f100becff", "VPCZoneIdentifier": "subnet-5ea0c127,subnet-6194ea3b,subnet-c934b782" }
To create and configure a new Auto Scaling group
  • Use the following create-auto-scaling-group command to create a new Auto Scaling group and enable Capacity Rebalancing. This command references a YAML file as the sole parameter for your Auto Scaling group.

    aws autoscaling create-auto-scaling-group --cli-input-yaml file://~/config.yaml

    Add the following line to your configuration file formatted in YAML.

    CapacityRebalance: true

    The following is an example config.yaml file.

    --- AutoScalingGroupName: my-asg DesiredCapacity: 12 MinSize: 12 MaxSize: 15 CapacityRebalance: true MixedInstancesPolicy: InstancesDistribution: OnDemandBaseCapacity: 0 OnDemandPercentageAboveBaseCapacity: 25 SpotAllocationStrategy: price-capacity-optimized LaunchTemplate: LaunchTemplateSpecification: LaunchTemplateName: my-launch-template Version: $Default Overrides: - InstanceType: c5.large - InstanceType: c5a.large - InstanceType: m5.large - InstanceType: m5a.large - InstanceType: c4.large - InstanceType: m4.large - InstanceType: c3.large - InstanceType: m3.large TargetGroupARNs: - arn:aws:elasticloadbalancing:us-west-2:123456789012:targetgroup/my-alb-target-group/943f017f100becff VPCZoneIdentifier: subnet-5ea0c127,subnet-6194ea3b,subnet-c934b782
To enable Capacity Rebalancing for an existing Auto Scaling group
  • Use the following update-auto-scaling-group command to enable Capacity Rebalancing.

    aws autoscaling update-auto-scaling-group --auto-scaling-group-name my-asg \ --capacity-rebalance
To verify that Capacity Rebalancing is enabled for an Auto Scaling group
  • Use the following describe-auto-scaling-groups command to verify that Capacity Rebalancing is enabled and to view the details.

    aws autoscaling describe-auto-scaling-groups --auto-scaling-group-name my-asg

    The following is an example response.

    { "AutoScalingGroups": [ { "AutoScalingGroupName": "my-asg", "AutoScalingGroupARN": "arn", ... "CapacityRebalance": true } ] }
To disable Capacity Rebalancing

Use the update-auto-scaling-group command with the --no-capacity-rebalance option to disable Capacity Rebalancing.

aws autoscaling update-auto-scaling-group --auto-scaling-group-name my-asg \ --no-capacity-rebalance

For more information about Capacity Rebalancing, see Proactively manage Spot Instance lifecycle using the new Capacity Rebalancing feature for EC2 Auto Scaling on the AWS Compute Blog.

For more information about the EC2 instance rebalance recommendations, see EC2 instance rebalance recommendations in the Amazon EC2 User Guide for Linux Instances.

To learn more about lifecycle hooks, see the following resources.

Limitations

  • Amazon EC2 Auto Scaling can replace the instance that receives the rebalance notification only if the instance isn't protected from scale in. However, scale-in protection doesn't prevent termination from a Spot interruption. For more information, see Use instance scale-in protection.

  • Support for Capacity Rebalancing is available in all commercial AWS Regions where Amazon EC2 Auto Scaling is available, except for the Middle East (UAE) Region.