Using automatic load-based scaling - AWS OpsWorks

Using automatic load-based scaling

Load-based instances let you rapidly start or stop instances in response to changes in incoming traffic. AWS OpsWorks Stacks uses Amazon CloudWatch data to compute the following metrics for each layer, which represent average values across all of the layer's instances:

  • CPU: The average CPU consumption, such as 80%

  • Memory: The average memory consumption, such as 60%

  • Load: The average computational work a system performs in one minute.

You define upscaling and downscaling thresholds for any or all of these metrics. You can also use custom CloudWatch alarms as thresholds.

Crossing a threshold triggers a scaling event. You determine how AWS OpsWorks Stacks responds to scaling events by specifying the following:

  • How many instances to start or stop.

  • How long AWS OpsWorks Stacks should wait after exceeding a threshold before starting or deleting instances. For example, CPU utilization must exceed the threshold for at least 15 minutes. This value allows you to ignore brief traffic fluctuations.

  • How long AWS OpsWorks Stacks should wait after starting or stopping instances before monitoring metrics again. You usually want to allow enough time for started instances to come online or stopped instances to shut down before assessing whether the layer is still exceeding a threshold.

When a scaling event occurs, AWS OpsWorks Stacks starts or stops only load-based instances. It does not start or stop 24/7 instances or time-based instances.

Note

Automatic load-based scaling does not create new instances; it starts and stops only those instances that you have created. You must therefore provision enough load-based instances in advance to handle the maximum anticipated load.

To create a load-based instance

  1. On the Instances page, choose +Instance to add an instance. Choose Advanced, and then choose load-based.

    
                        Load-based scaling option on Add instance page
  2. Configure the instance, then choose Add Instance to add the instance to the layer.

Repeat this procedure until you have created a sufficient number of instances. You can add or remove instances later, as required.

After you have added load-based instances to a layer, you must enable load-based scaling and specify the configuration. The load-based scaling configuration is a layer property, not an instance property, that specifies when a layer should start or stop its load-based instances. It must be specified separately for each layer that uses load-based instances.

To enable and configure automatic load-based scaling

  1. In the navigation pane, under Instances, choose Load-based, and then choose edit for the appropriate layer.

    
                        edit action on instance layer
  2. Set Load-based auto scaling enabled to On. Then set threshold and scaling parameters to define how and when to add or delete instances.

    
                        Thresholds for load-based scaling
    Layer-average thresholds

    You can set scaling thresholds based on the following values, which are averaged over all of the layer's instances.

    • Average CPU – The layer's average CPU utilization, as a percent of the total.

    • Average memory – The layer's average memory utilization, as a percent of the total.

    • Average load – The layer's average load.

      For more information about how load is computed, see Load (computing) on Wikipedia.

    Crossing a threshold causes a scaling event, upscaling if more instances are needed, and downscaling if fewer instances are needed. AWS OpsWorks Stacks then adds or deletes instances based on the scaling parameters.

    Custom CloudWatch alarms

    You can use up to five custom CloudWatch alarms as upscaling or downscaling thresholds. They must be in the same region as the stack. For more information about how to create custom alarms, see Creating Amazon CloudWatch Alarms.

    Note

    To use custom alarms, you must update your service role to allow cloudwatch:DescribeAlarms. You can either have AWS OpsWorks Stacks update the role for you the first time you use this feature, or you can edit the role manually. For more information, see Allowing AWS OpsWorks Stacks to Act on Your Behalf.

    When there are multiple alarms configured for load-based configuration, if an alarm is in the INSUFFICIENT_DATA metric alarm state, load-based instance scaling cannot occur even if another alarm is in the ALARM state. Auto scaling can proceed only if all alarms are in the OK or ALARM states. For more information about using Amazon CloudWatch alarms, see Using Amazon CloudWatch alarms in the Amazon CloudWatch User Guide.

    Scaling parameters

    The following parameters control how AWS OpsWorks Stacks manages scaling events.

    • Start servers in batches of – The number of instances to add or remove when the scaling event occurs.

    • If thresholds are exceeded – The amount of time (in minutes), that the load must remain over an upscaling threshold or under a downscaling threshold before AWS OpsWorks Stacks triggers a scaling event.

    • After scaling, ignore metrics – The amount of time (in minutes) after a scaling event occurs that AWS OpsWorks Stacks should ignore metrics and suppress additional scaling events.

      For example, AWS OpsWorks Stacks adds new instances following an upscaling event but the instances won't start reducing the load until they have been booted and configured. There is no point in raising additional scaling events until the new instances are online and handling requests, which typically takes several minutes. This setting allows you to direct AWS OpsWorks Stacks to suppress scaling events long enough to get the new instances online.

      You can increase this setting to prevent sudden swings in scaling when layer averages such as Average CPU, Average memory, or Average load are in temporary disagreement.

      For example, if CPU usage is above the limit and memory usage is close to downscaling, an instance upscale event might immediately be followed by a memory downscaling event. To prevent this, you can increase the number of minutes in the After scaling, ignore metrics setting. In this example, the CPU scaling would occur, but the memory downscaling event would not.

  3. To add additional load-based instances, choose + Instance, configure the settings, and then choose Add Instance. Repeat until you have enough load-based instances to handle your maximum anticipated load. Then choose Save.

Note

You can also add a new load-based instance to a layer by opening the Load-based page, and choosing Add a load-based instance (if you have not yet added a load-based instance to the layer) or + Instance (if the layer already has one or more load-based instances). Then configure the instance as described earlier in this section.

To add an existing load-based instance to a layer

  1. In the navigation pane, under Instances, choose Load-based.

  2. If you have already enabled load-based automatic scaling for a layer, choose + Instance. Otherwise, choose Add a load-based instance. Choose the Existing tab.

    
                        Add existing load-based instance to a layer
  3. On the Existing tab, choose an instance. The list shows only load-based instances.

    Note

    If you change your mind about using an existing instance, on the New tab, create a new instance as described in the preceding procedure.

  4. Choose Add Instance to add the instance to the layer.

You can modify the configuration for or disable automatic load-based scaling at any time.

To disable automatic load-based scaling

  1. In the navigation pane, under Instances, choose Load-based, and then choose edit for the appropriate layer.

  2. Switch Load-based auto scaling enabled to No.