Predictive scaling for Amazon EC2 Auto Scaling - Amazon EC2 Auto Scaling

Predictive scaling for Amazon EC2 Auto Scaling

Use predictive scaling to increase the number of EC2 instances in your Auto Scaling group in advance of daily and weekly patterns in traffic flows.

Predictive scaling is well suited for situations where you have:

  • Cyclical traffic, such as high use of resources during regular business hours and low use of resources during evenings and weekends

  • Recurring on-and-off workload patterns, such as batch processing, testing, or periodic data analysis

  • Applications that take a long time to initialize, causing a noticeable latency impact on application performance during scale-out events

In general, if you have regular patterns of traffic increases and applications that take a long time to initialize, you should consider using predictive scaling. Predictive scaling can help you scale faster by launching capacity in advance of forecasted load, compared to using only dynamic scaling, which is reactive in nature. Predictive scaling can also potentially save you money on your EC2 bill by helping you avoid the need to overprovision capacity.

For example, consider an application that has high usage during business hours and low usage overnight. At the start of each business day, predictive scaling can add capacity before the first influx of traffic. This helps your application maintain high availability and performance when going from a period of lower utilization to a period of higher utilization. You don't have to wait for dynamic scaling to react to changing traffic. You also don't have to spend time reviewing your application's load patterns and trying to schedule the right amount of capacity using scheduled scaling.

Use the AWS Management Console, the AWS CLI, or one of the SDKs to add a predictive scaling policy to any Auto Scaling group.

How predictive scaling works

Predictive scaling uses machine learning to predict capacity requirements based on historical data from CloudWatch. The machine learning algorithm consumes the available historical data and calculates capacity that best fits the historical load pattern, and then continuously learns based on new data to make future forecasts more accurate.

To use predictive scaling, you first create a scaling policy with a pair of metrics and a target utilization. Forecast creation starts immediately after you create your policy if there is at least 24 hours of historical data. Predictive scaling finds patterns in CloudWatch metric data from the previous 14 days to create an hourly forecast for the next 48 hours. Forecast data is updated daily based on the most recent CloudWatch metric data.

You can configure predictive scaling in forecast only mode so that you can evaluate the forecast before you allow predictive scaling to actively scale capacity. You can then view the forecast and recent metric data from CloudWatch in graph form from the Amazon EC2 Auto Scaling console. You can also access forecast data by using the AWS CLI or one of the SDKs.

When you are ready to start scaling with predictive scaling, switch the policy from forecast only mode to forecast and scale mode. After you switch to forecast and scale mode, your Auto Scaling group starts scaling based on the forecast.

Using the forecast, Amazon EC2 Auto Scaling scales the number of instances at the beginning of each hour:

  • If actual capacity is less than the predicted capacity, Amazon EC2 Auto Scaling scales out your Auto Scaling group so that its desired capacity is equal to the predicted capacity.

  • If actual capacity is greater than the predicted capacity, Amazon EC2 Auto Scaling doesn't scale in capacity.

  • The values that you set for the minimum and maximum capacity of the Auto Scaling group are adhered to if the predicted capacity is outside of this range.

Best practices

  • Confirm whether predictive scaling is suitable for your workload. A workload is a good fit for predictive scaling if it exhibits recurring load patterns that are specific to the day of the week or the time of day. To check this, configure predictive scaling policies in forecast only mode.

  • Evaluate the forecast and its accuracy before allowing predictive scaling to actively scale your application. Predictive scaling needs at least 24 hours of historical data to start forecasting. However, forecasts are more effective if historical data spans the full two weeks. If you update your application by creating a new Auto Scaling group and deleting the old one, then your new Auto Scaling group needs 24 hours of historical load data before predictive scaling can start generating forecasts again. In this case, you might have to wait a few days for a more accurate forecast.

  • Create multiple predictive scaling policies in forecast only mode to test the potential effects of different metrics. You can create multiple predictive scaling policies for each Auto Scaling group, but only one of the policies can be used for active scaling.

  • If you choose a custom metric pair, you need to define your metrics. You can't use custom metrics, but you can use a different combination of load metric and scaling metric. To avoid issues, make sure that the load metric you choose represents the full load on your application.

  • Use predictive scaling with dynamic scaling. Dynamic scaling is used to automatically scale capacity in response to real-time changes in resource utilization. Using it with predictive scaling helps you follow the demand curve for your application closely, scaling in during periods of low traffic and scaling out when traffic is higher than expected. When multiple scaling policies are active, each policy determines the desired capacity independently, and the desired capacity is set to the maximum of those. For example, if 10 instances are required to stay at the target utilization in a target tracking scaling policy, and 8 instances are required to stay at the target utilization in a predictive scaling policy, then the group's desired capacity is set to 10.

Create a predictive scaling policy (console)

You can configure predictive scaling policies on an Auto Scaling group after the group is created.

To create a predictive scaling policy

  1. Open the Amazon EC2 Auto Scaling console at https://console.aws.amazon.com/ec2autoscaling/.

  2. Select the check box next to your Auto Scaling group.

    A split pane opens up in the bottom part of the Auto Scaling groups page, showing information about the group that's selected.

  3. On the Automatic scaling tab, in Scaling policies, choose Create predictive scaling policy.

  4. To define a policy, do the following:

    1. Enter a name for the policy.

    2. Turn on Scale based on forecast to give Amazon EC2 Auto Scaling permission to start scaling right away.

      To keep the policy in forecast only mode, keep Scale based on forecast turned off.

    3. For Metrics, choose your metrics from the list of options. Options include CPU, Network In, Network Out, Application Load Balancer request count, and Custom metric pair.

      If you chose Application Load Balancer request count per target, then choose a target group in Target group. Application Load Balancer request count per target is only supported if you have attached an Application Load Balancer target group to your Auto Scaling group.

      If you chose Custom metric pair, choose individual metrics from the drop-down lists for Load metric and Scaling metric.

  5. For Target utilization, enter the target value that Amazon EC2 Auto Scaling should maintain. Amazon EC2 Auto Scaling scales out your capacity until the average utilization is at the target utilization, or until it reaches the maximum number of instances you specified.

    If your scaling metric is... Then the target utilization represents...
    CPU

    The percentage of CPU that each instance should ideally use.

    Network In

    The average number of bytes per minute that each instance should ideally receive.

    Network Out

    The average number of bytes per minute that each instance should ideally send out.

    Application Load Balancer request count per target

    The average number of requests per minute that each instance should ideally receive.

  6. (Optional) For Pre-launch instances, choose how far in advance you want your instances launched before the forecast calls for the load to increase.

  7. (Optional) For Max capacity behavior, choose whether to allow Amazon EC2 Auto Scaling to scale out higher than the group's maximum capacity when predicted capacity exceeds the defined maximum. Turning on this setting allows scale out to occur during periods when your traffic is forecasted to be at its highest.

  8. (Optional) For Buffer maximum capacity above the forecasted capacity, choose how much additional capacity to use when the predicted capacity is close to or exceeds the maximum capacity. The value is specified as a percentage relative to the predicted capacity. For example, if the buffer is 10, this means a 10 percent buffer, so if the predicted capacity is 50 and the maximum capacity is 40, then the effective maximum capacity is 55.

    If set to 0, Amazon EC2 Auto Scaling may scale capacity higher than the maximum capacity to equal but not exceed predicted capacity.

  9. Choose Create predictive scaling policy.

Create a predictive scaling policy (AWS CLI)

Use the AWS CLI as follows to configure predictive scaling policies for your Auto Scaling group. For more information about the CloudWatch metrics you can specify for a predictive scaling policy, see PredictiveScalingMetricSpecification in the Amazon EC2 Auto Scaling API Reference.

Example 1: A predictive scaling policy that creates forecasts but doesn't scale

The following example policy shows a complete policy configuration that uses CPU utilization metrics for predictive scaling with a target utilization of 40. ForecastOnly mode is used by default, unless you explicitly specify which mode to use. Save this configuration in a file named config.json.

{ "MetricSpecifications": [ { "TargetValue": 40, "PredefinedMetricPairSpecification": { "PredefinedMetricType": "ASGCPUUtilization" } } ] }

To create this policy, run the put-scaling-policy command with the configuration file specified, as demonstrated in the following example.

aws autoscaling put-scaling-policy --policy-name cpu40-predictive-scaling-policy \ --auto-scaling-group-name my-asg --policy-type PredictiveScaling \ --predictive-scaling-configuration file://config.json

If successful, this command returns the policy's Amazon Resource Name (ARN).

{ "PolicyARN": "arn:aws:autoscaling:region:account-id:scalingPolicy:2f4f5048-d8a8-4d14-b13a-d1905620f345:autoScalingGroupName/mygroup:policyName/cpu40-predictive-scaling-policy", "Alarms": [] }

Example 2: A predictive scaling policy that forecasts and scales

For a policy that allows Amazon EC2 Auto Scaling to forecast and scale, add the property Mode with a value of ForecastAndScale. The following example shows a policy configuration that uses Application Load Balancer request count metrics. The target utilization is 1000, and predictive scaling is set to ForecastAndScale mode.

{ "MetricSpecifications": [ { "TargetValue": 1000, "PredefinedMetricPairSpecification": { "PredefinedMetricType": "ALBRequestCount", "ResourceLabel": "app/my-alb/778d41231b141a0f/targetgroup/my-alb-target-group/943f017f100becff" } } ], "Mode": "ForecastAndScale" }

To create this policy, run the put-scaling-policy command with the configuration file specified, as demonstrated in the following example.

aws autoscaling put-scaling-policy --policy-name alb1000-predictive-scaling-policy \ --auto-scaling-group-name my-asg --policy-type PredictiveScaling \ --predictive-scaling-configuration file://config.json

If successful, this command returns the policy's Amazon Resource Name (ARN).

{ "PolicyARN": "arn:aws:autoscaling:region:account-id:scalingPolicy:19556d63-7914-4997-8c81-d27ca5241386:autoScalingGroupName/mygroup:policyName/alb1000-predictive-scaling-policy", "Alarms": [] }

Example 3: A predictive scaling policy that can scale higher than maximum capacity

The following example shows how to create a policy that can scale higher than the group's maximum size limit when you need it to handle a higher than normal load. By default, Amazon EC2 Auto Scaling doesn't scale your EC2 capacity higher than your defined maximum capacity. However, it might be helpful to let it scale higher with slightly more capacity to avoid performance or availability issues.

To provide room for Amazon EC2 Auto Scaling to provision additional capacity when the capacity is predicted to be at or very close to your group's maximum size, specify the MaxCapacityBreachBehavior and MaxCapacityBuffer properties, as shown in the following example. You must specify MaxCapacityBreachBehavior with a value of IncreaseMaxCapacity. The maximum number of instances that your group can have depends on the value of MaxCapacityBuffer.

{ "MetricSpecifications": [ { "TargetValue": 70, "PredefinedMetricPairSpecification": { "PredefinedMetricType": "ASGCPUUtilization" } } ], "MaxCapacityBreachBehavior": "IncreaseMaxCapacity", "MaxCapacityBuffer": 10 }

In this example, the policy is configured to use a 10 percent buffer ("MaxCapacityBuffer": 10), so if the predicted capacity is 50 and the maximum capacity is 40, then the effective maximum capacity is 55. A policy that can scale capacity higher than the maximum capacity to equal but not exceed predicted capacity would have a buffer of 0 ("MaxCapacityBuffer": 0).

To create this policy, run the put-scaling-policy command with the configuration file specified, as demonstrated in the following example.

aws autoscaling put-scaling-policy --policy-name cpu70-predictive-scaling-policy \ --auto-scaling-group-name my-asg --policy-type PredictiveScaling \ --predictive-scaling-configuration file://config.json

If successful, this command returns the policy's Amazon Resource Name (ARN).

{ "PolicyARN": "arn:aws:autoscaling:region:account-id:scalingPolicy:d02ef525-8651-4314-bf14-888331ebd04f:autoScalingGroupName/mygroup:policyName/cpu70-predictive-scaling-policy", "Alarms": [] }

Limitations

Predictive scaling requires 24 hours of metric history before it can generate forecasts.

You currently cannot use predictive scaling with Auto Scaling groups that have a mixed instances policy.

You currently cannot specify your custom metrics in a predictive scaling policy.