Auto Scaling
Enable Auto Scaling
You can create a cluster (or nodegroup in an existing cluster) with IAM role that will allow use of cluster autoscaler
eksctl create cluster --asg-access
This flag also sets k8s.io/cluster-autoscaler/enabled
and k8s.io/cluster-autoscaler/<clusterName>
tags, so nodegroup discovery should work.
Once the cluster is running, you will need to install Cluster Autoscaler
You should also add the following to your managed or unmanaged nodegroup definition(s) to add the tags required for the Cluster Autoscaler to scale the nodegroup:
nodeGroups: - name: ng1-public iam: withAddonPolicies: autoScaler: true
Scaling up from 0
If you would like to be able to scale your node group up from 0 and you have labels and/or taints defined on your nodegroups, you will need to propagate these as tags on your Auto Scaling Groups (ASGs).
One way to do this is by setting the ASG tags in the tags
field of your nodegroup
definitions. For example, given a nodegroup with the following labels and
taints:
nodeGroups: - name: ng1-public ... labels: my-cool-label: pizza taints: key: feaster value: "true" effect: NoSchedule
You would need to add the following ASG tags:
nodeGroups: - name: ng1-public ... labels: my-cool-label: pizza taints: feaster: "true:NoSchedule" tags: k8s.io/cluster-autoscaler/node-template/label/my-cool-label: pizza k8s.io/cluster-autoscaler/node-template/taint/feaster: "true:NoSchedule"
For both managed and unmanaged nodegroups, this can be done automatically by setting propagateASGTags
to true
, which will add the labels and taints as tags to the Auto Scaling group:
nodeGroups: - name: ng1-public ... labels: my-cool-label: pizza taints: feaster: "true:NoSchedule" propagateASGTags: true
Zone-aware Auto Scaling
If your workloads are zone-specific you’ll need to create separate nodegroups for each zone. This is because the cluster-autoscaler
assumes that all nodes in a group are exactly equivalent. So, for example, if a scale-up event is triggered by a pod which needs a zone-specific PVC (e.g. an EBS volume), the new node might get scheduled in the wrong AZ and the pod will fail to start.
You won’t need a separate nodegroup for each AZ if your environment meets the following criteria:
-
No zone-specific storage requirements.
-
No required podAffinity with topology other than host.
-
No required nodeAffinity on zone label.
-
No nodeSelector on a zone label.
If you meet all of the above requirements (and possibly others) then you should be safe with a single nodegroup which spans multiple AZs. Otherwise you’ll want to create separate, single-AZ nodegroups:
BEFORE:
nodeGroups: - name: ng1-public instanceType: m5.xlarge # availabilityZones: ["eu-west-2a", "eu-west-2b"]
AFTER:
nodeGroups: - name: ng1-public-2a instanceType: m5.xlarge availabilityZones: ["eu-west-2a"] - name: ng1-public-2b instanceType: m5.xlarge availabilityZones: ["eu-west-2b"]