Auto Scaling
Developer Guide (API Version 2011-01-01)
Next »
View the PDF for this guide.Go to the AWS Discussion Forum for this product.Go to the Kindle Store to download this guide in Kindle format.Did this page help you?  Yes | No |  Tell us about it...

What is Auto Scaling?

Auto Scaling is a web service that enables you to automatically launch or terminate Amazon Elastic Compute Cloud (Amazon EC2) instances based on user-defined policies, health status checks, and schedules. Amazon EC2 instances are servers in the cloud. For applications configured to run on a cloud infrastructure, scaling is an important part of cost control and resource management. Scaling is the ability to increase or decrease the compute capacity of your application by either changing the number of servers (horizontal scaling) or changing the size of the servers (vertical scaling).

In a typical business situation, when your web application starts to get more traffic, you either add more servers or increase the size of your existing servers to handle the additional load. Similarly, if the traffic to your web application starts to slow down, you either terminate the under-utilized servers or decrease the size of your existing servers. Depending on your infrastructure, vertical scaling might involve changes to your server configurations every time you scale. With horizontal scaling, you simply increase or decrease the number of servers according to your application's demands. The decision when to scale vertically and when to scale horizontally depends on factors such as your use case, cost, performance, and infrastructure.

When you scale using Auto Scaling you can increase the number of servers you’re using automatically when the user demand goes up to ensure that performance is maintained, and you can decrease the number of servers when demand goes down to minimize costs. Auto Scaling helps you make efficient use of your compute resources by automatically doing the work of scaling for you. This automatic scaling is the core value of the Auto Scaling service.

Auto Scaling is well suited for applications that experience hourly, daily, or weekly variability in usage and need to automatically scale horizontally to keep up with usage variability. Auto Scaling frees you from having to predict huge traffic spikes accurately and plan for provisioning resources in advance of them. With Auto Scaling, you can build a fully scalable and affordable infrastructure on the cloud.

Auto Scaling allows you to scale your compute resources dynamically and predictably:

  • Dynamically based on conditions specified by you (for example, increasing CPU utilization of your Amazon EC2 instance)

  • Predictably according to a schedule defined by you (for example, every Friday at 13:00:00).

Let's look at an example of how scaling works. Suppose you have a web application that runs on a single cloud server. The single server performs well when you have regular traffic. However, occasionally the traffic to your application increases up to three times the normal load. When that happens, you need an additional cloud server to handle the increased traffic. For your application to scale gracefully with the additional traffic load, you'll need to launch the second cloud server ahead of the time when the increased load occurs, and then terminate that server after traffic goes down to normal levels. This process works best where your application has predictable traffic patterns, so that you will know when to launch the additional server and when to terminate it.

However, what if you do not know when the next traffic spike will hit your application? Where traffic spikes are not possible to predict, you would need to launch two cloud servers and keep them running at all times, even when the second server rarely gets any traffic. Of course, the additional server will incur costs while it is running.

What happens in this example if you use Auto Scaling? First, you will not have to keep the second server running all the time. Instead, you define the conditions that determine the increasing traffic to your application servers, and then tell Auto Scaling to launch a similar application server whenever those conditions are met. Second, you define another set of conditions that determine the decreasing traffic to your application servers and then tell Auto Scaling to terminate a server when those conditions are met. The following diagram illustrates a set of simple Auto Scaling conditions.

Auto Scaling Architectural Diagram