PERF02-BP05 Use the available elasticity of resources
The cloud provides the flexibility to expand and reduce your resources dynamically through a variety of mechanisms to meet changes in demand. Combining this elasticity with compute-related metrics, a workload can automatically respond to changes to use the resources it needs and only the resources it needs.
Common anti-patterns:
-
You overprovision to cover possible spikes.
-
You react to alarms by manually increasing capacity.
-
You increase capacity without considering provisioning time.
-
You leave increased capacity after a scaling event instead of scaling back down.
-
You monitor metrics that don’t directly reflect your workloads true requirements.
Benefits of establishing this best practice: Demand can be fixed, variable, follow a pattern or be spiky. Matching supply to demand delivers the lowest cost for a workload. Monitoring, testing, and configuring workload elasticity will optimize performance, save money, and improve reliability as usage demands change. Although a manual approach to this is possible, it is impractical at larger scales. An automated and metrics-based approach assures resources meet demands and any given time.
Level of risk exposed if this best practice is not established: Medium
Implementation guidance
Metric based automation should be used to take advantage of elasticity with the goal that the supply of resources you have matches the demand of the resources your workload requires. For example, you can use Amazon CloudWatch metrics to monitor your resources
Combined with compute-related metrics, a workload can automatically respond to changes and use the optimal set of resources to achieve its goal. You also must plan for provisioning time and potential resource failures.
Instances, containers, and functions provide mechanisms for elasticity either as a feature of the service, in the form of Application Auto Scaling
Validate your metrics for scaling up or down elastic resources against the type of workload being deployed. As an example, if you are deploying a video transcoding application, 100% CPU utilization is expected and should not be your primary metric. Alternatively, you can measure against the queue depth of transcoding jobs waiting to scale your instance types.
Workload deployments need to handle both scale up and scale down events. Scaling down workload components safely is as critical as scaling up resources when demand dictates.
Create test scenarios for scaling events to verify that the workload behaves as expected.
Implementation steps
-
Leverage historical data to analyze your workload’s resource demands over time. Ask specific questions like:
-
Is your workload steady and increasing over time at a known rate?
-
Does your workload increase and decrease in seasonal, repeatable patterns?
-
Is your workload spiky? Can the spikes be anticipated or predicted?
-
-
Leverage monitoring services and historical data as much as possible.
-
Tagging resources can help with monitoring. When using tags, refer to tagging best practices. Additionally, tags can help you manage, identify, and organize resources.
-
With AWS, you can use a number of different approaches to match supply with demand. The cost optimization pillar best practices (COST09-BP01 through COST09-03) describe how to use the following approaches to cost:
-
Create test scenarios for scale down events to verify that the workload behaves as expected.
-
Most non-production instances should be stopped when they are not being used.
-
For storage needs when using Amazon Elastic Block Store (Amazon EBS), take advantage of volume-based elasticity.
-
For Amazon Elastic Compute Cloud (Amazon EC2)
, consider using Auto Scaling groups, which allow you to optimize performance and cost by automatically increasing the number of compute instances during demand spikes and decreasing capacity when demand decreases.
Resources
Related best practices:
Related documents:
Related videos:
Related examples: