Scaling - Containers on AWS

This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

Scaling

Amazon ECS is a fully managed container orchestration service with no control planes to manage scaling. Amazon ECS provides options to efficiently auto-scale Amazon EC2 cluster nodes and Amazon ECS services for your clusters. Amazon ECS will ensure the Amazon EC2 Auto Scaling groups scale in and out as needed with no further intervention required. Amazon ECS cluster auto-scaling relies on Amazon ECS capacity providers, which provide the link between your Amazon ECS cluster and the Auto Scaling groups you want to use. The core responsibility of Amazon ECS cluster auto scaling is to ensure that the right number of instances are running in an Auto Scaling group to meet the needs of the tasks assigned to that Group, including tasks already running as well as tasks the customer is trying to run that don’t fit on the existing instances. For more details, read the Deep Dive on Amazon ECS Cluster Auto Scaling blog.

Amazon ECS Auto Scaling is the ability to automatically increase or decrease the desired count of tasks in your Amazon ECS service for both Amazon EC2 and Fargate based clusters. You can use the services' CPU and memory or other CloudWatch metrics. Amazon ECS Auto Scaling supports the following types of auto scaling:

  • Target Tracking Scaling Policies: Increase or decrease the number of tasks that your service runs based on a target value for a specific metric.

  • Step Scaling Policies: Increase or decrease the number of tasks that your service runs based on a set of scaling adjustments that vary based on the size of the alarm breach.

  • Scheduled Scaling: Increase or decrease the number of tasks that your service runs based on the date and time.

Amazon EKS automatically manages the availability and scalability of the Kubernetes control plane nodes, which consist of Kubernetes API server, kube-scheduler, kube-controller-manager, and etcd nodes. Kubernetes provides following options, compatible with Amazon EKS, to scale worker nodes and pods:

  • The Kubernetes Cluster Autoscaler automatically adjusts the number of worker nodes in your cluster so that all pods fail have a place to run and make sure there are no unnecessary worker nodes. Amazon EKS node groups are provisioned as part of an Amazon EC2 Auto Scaling group, which is compatible with the Cluster Autoscaler.

  • The Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller or stateful set based on CPU utilization or with custom metrics. This can help your applications scale out to meet increased demand or scale in when resources are not needed, thus freeing up your nodes for other applications, similar to Amazon ECS Auto Scaling. When you set a target CPU use percentage, the Horizontal Pod Autoscaler scales your application in or out to try to meet that target.

  • KEDA is an open-source Kubernetes-based Event Driven Autoscaler that automatically scales the number of pods, and works alongside standard Kubernetes components like the Horizontal Pod Autoscaler. With KEDA, you can explicitly map the apps with various event sources that it supports to auto scale.

  • The Kubernetes Vertical Pod Autoscaler frees the users from necessity of setting up-to-date resource limits and requests for the containers in their pods based on historical resource usage over time. By default, it only provides the calculated recommendation without automatically changing resource requirements of the pods, but when auto mode is configured, it will resize the pod requests automatically and restart existing pods onto nodes with the appropriate resource amounts. It will also maintain ratios between limits and requests that were specified in initial containers configuration.

  • Karpenter is an open-source cluster autoscaler that automatically launches right-sized nodes in response to unschedulable pods without Amazon EC2 Auto Scaling groups. Karpenter evaluates the aggregate resource requirements of the pending pods and chooses the optimal instance type to run them. It works natively with Kubernetes scheduling constraints. It also supports a consolidation feature to help lower cluster compute costs by looking for opportunities to remove under-utilized nodes, replace expensive nodes with cheaper alternatives, and consolidate workloads onto more efficient compute resources.