|Did this page help you? Yes | No | Tell us about it...|
Auto Scaling is a web service that enables you to automatically launch or terminate Amazon Elastic Compute Cloud (Amazon EC2) instances based on user-defined policies, health status checks, and schedules. Amazon EC2 instances are servers in the cloud. For applications configured to run on a cloud infrastructure, scaling is an important part of cost control and resource management. Scaling is the ability to increase or decrease the compute capacity of your application by either changing the number of servers (horizontal scaling) or changing the size of the servers (vertical scaling).
In a typical business situation, when your web application starts to get more traffic, you either add more servers or increase the size of your existing servers to handle the additional load. Similarly, if the traffic to your web application starts to slow down, you either terminate the under-utilized servers or decrease the size of your existing servers. Depending on your infrastructure, vertical scaling might involve changes to your server configurations every time you scale. With horizontal scaling, you simply increase or decrease the number of servers according to your application's demands. The decision when to scale vertically and when to scale horizontally depends on factors such as your use case, cost, performance, and infrastructure.
When you scale using Auto Scaling you can increase the number of servers you’re using automatically when the user demand goes up to ensure that performance is maintained, and you can decrease the number of servers when demand goes down to minimize costs. Auto Scaling helps you make efficient use of your compute resources by automatically doing the work of scaling for you. This automatic scaling is the core value of the Auto Scaling service.
Auto Scaling is well suited for applications that experience hourly, daily, or weekly variability in usage and need to automatically scale horizontally to keep up with usage variability. Auto Scaling frees you from having to predict huge traffic spikes accurately and plan for provisioning resources in advance of them. With Auto Scaling, you can build a fully scalable and affordable infrastructure on the cloud.
Auto Scaling allows you to scale your compute resources dynamically and predictably:
Dynamically based on conditions specified by you (for example, increasing CPU utilization of your Amazon EC2 instance)
Predictably according to a schedule defined by you (for example, every Friday at 13:00:00).
Let's look at an example of how Auto Scaling works. Suppose you have a web application that runs on a single cloud server. The single server performs well when you have regular traffic. However, occasionally the traffic to your application increases up to three times the normal load. When that happens, you need an additional cloud server to handle the increased traffic. For your application to scale gracefully with the additional traffic load, you'll need to launch the second cloud server ahead of the time when the increased load occurs, and then terminate that server after traffic goes down to normal levels. This process works best where your application has predictable traffic patterns, so that you will know when to launch the additional server and when to terminate it.
However, what if you do not know when the next traffic spike will hit your application? Where traffic spikes are not possible to predict, you would need to launch two cloud servers and keep them running at all times, even when the second server rarely gets any traffic. Of course, the additional server will incur costs while it is running.
What happens in this example if you use Auto Scaling? First, you will not have to keep the second server running all the time. Instead, you define the conditions that determine the increasing traffic to your application servers, and then tell Auto Scaling to launch a similar application server whenever those conditions are met. Second, you define another set of conditions that determine the decreasing traffic to your application servers and then tell Auto Scaling to terminate a server when those conditions are met. The following diagram illustrates a set of simple Auto Scaling conditions.
In a common web application scenario, you run multiple copies of your application simultaneously to cover the volume of your customer traffic. These multiple copies of your application are hosted on identical Amazon EC2 instances (cloud servers), each handling customer requests. Auto Scaling manages the launch and termination of these EC2 instances on your behalf. This section assumes that you are familiar with Amazon Elastic Compute Cloud (Amazon EC2), and that you are using EC2 instances. For information about EC2 instances, see Amazon EC2 Instances.
When you use Auto Scaling, your EC2 instances are categorized into Auto Scaling groups for the purposes of instance scaling and management. You create Auto Scaling groups by defining the minimum, maximum, and, optionally, the desired number of running EC2 instances the group must have at any point in time.
Your Auto Scaling group uses a launch configuration to launch EC2 instances. You create the launch configuration by providing information about the image you want Auto Scaling to use to launch EC2 instances. The information can be the image ID, instance type, key pairs, security groups, and block device mapping. To learn more about Amazon machine images (AMI), see AMI Basics.
In addition to creating a launch configuration and an Auto Scaling group, you must also create a scaling plan for your Auto Scaling group. A scaling plan tells Auto Scaling when and how to scale. You can create a scaling plan based on the occurrence of specified conditions (dynamic scaling) or you can create a plan based on a specific schedule.
Auto Scaling starts by launching the minimum number (or the desired number, if specified) of EC2 instances and then starts executing the scaling plan.
If your scaling plan includes dynamic scaling and your specified conditions are met, the Auto Scaling group either scales out by launching additional EC2 instances until the maximum number of specified instances is reached, or scales in by terminating EC2 instances until the number of running instances equals the minimum number of specified instances.
If your scaling plan includes scaling based on a schedule, and depending on the scaling plan, Auto Scaling either scales out by launching additional instances or scales in by terminating instances at the scheduled time.
During normal traffic loads, the Auto Scaling group maintains the number of running instances at the minimum number (or desired number, if specified) that you defined.
Auto Scaling also supports maintaining the minimum number (or the desired number, if specified) of running EC2 instances at all times without associating a scaling plan with your Auto Scaling group. You can manually adjust the number of running instances in your Auto Scaling group at any time.
Auto Scaling helps you make efficient use of your compute resources by automatically doing the work of scaling for you. This automatic scaling is the core value of the Auto Scaling service.
In addition to launching and terminating EC2 instances on demand, Auto Scaling also ensures that the EC2 instances within the Auto Scaling group are running and in good shape. Auto Scaling performs a periodic health check on current instances within an Auto Scaling group, and when it finds an unhealthy instance, it terminates that instance and launches a new one. This helps in maintaining the number of running instances at the minimum number (or desired number, if specified) that you defined.