This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.
Elastic
Elasticity
AWS Lambda
AWS Lambda has elastic scalability already built in: the service executes your code only when needed and scales automatically, from a few requests per day to thousands per second. The first time you invoke your function, AWS Lambda creates an instance of the function and runs its handler method to process the event. When the function returns a response, it stays active and waits to process additional events. If you invoke the function again while the first event is being processed, Lambda initializes another instance, and the function processes the two events concurrently. As more events come in, Lambda routes them to available instances and creates new instances as needed. When the number of requests decreases, Lambda stops unused instances to free up scaling capacity for other functions. The functions concurrency is the number of instances that serve requests at a given time. Concurrency is subject to a regional quota that is shared by all functions in a region. AWS Lambda offers Reserved concurrency and Provisioned concurrency to control concurrency. Reserved concurrency guarantees the maximum number of concurrent instances for the function. When a function has reserved concurrency, other function can’t use that concurrency. Provisioned concurrency initializes a requested number of execution environments so that they are prepared to respond immediately to your function's invocations. For more information, see AWS Lambda function scaling.
Amazon ECS and Amazon EKS
Amazon ECS cluster auto scaling gives you control over how you scale tasks within a cluster. Each cluster has one or more capacity providers and an optional default capacity provider strategy. The capacity providers determine the infrastructure to use for the tasks, and the capacity provider strategy determines how the tasks are spread across the capacity providers. When you run a task or create a service, you can either use the cluster's default capacity provider strategy or specify a capacity provider strategy that overrides the cluster's default strategy.
Amazon ECS publishes Amazon CloudWatch metrics with your service’s average CPU and memory usage. You can use these and other CloudWatch metrics to scale out your service (add more tasks) to deal with high demand at peak times, and to scale in your service (run fewer tasks) to reduce costs during periods of low utilization. In order to scale the underlying infrastructure, Amazon ECS offers integration into Auto Scaling with ECS Cluster Auto Scaling. ECS Cluster Auto Scaling uses the ECS Capacity Provider construct in ECS to manage Amazon EC2 Auto Scaling groups on your behalf. For Amazon EKS, there is a similar approach with the Kubernetes Cluster Autoscaler. It automatically adjusts the number of nodes in a cluster when pods fail to launch due to lack of resources or when nodes in the cluster are underutilized and their pods can be rescheduled onto other nodes in the cluster.
Amazon ECS Service Auto Scaling supports several types of automatic scaling that influence
the number of tasks for a given ECS service. Amazon ECS cluster auto scaling however is a
capability for ECS to manage the scaling of Amazon EC2 Auto Scaling groups. With Amazon ECS cluster auto
scaling you can configure ECS to scale your Auto Scaling groups automatically, and just
focus on running your tasks. ECS will ensure the Auto Scaling groups scale in and out as
needed with no further intervention required. ECS cluster auto scaling relies on ECS
capacity providers, which provide the link between your ECS cluster and the Auto Scaling
groups you want to use. For more information, refer to Amazon ECS Cluster Auto
Scaling

Scaling with Amazon ECS
A similar approach is available for Amazon EKS with Horizontal Pod Autoscaler. The Horizontal Pod Autoscaler is a standard API resource in Kubernetes that simply requires that a metrics source (such as the Kubernetes metrics server) is installed on your Amazon EKS cluster to work. You do not need to deploy or install the Horizontal Pod Autoscaler on your cluster to begin scaling your applications.
The Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, or replica set based on that resource's CPU or memory utilization, custom metrics are also supported. This can help your applications scale out to meet increased demand or scale in when resources are not needed, thus freeing up your worker nodes for other applications. When you set a target CPU utilization percentage, the Horizontal Pod Autoscaler scales your application in or out to try to meet that target.

Scaling with Amazon EKS
Amazon Kinesis Data Streams
Amazon Kinesis Data Streams