Configure pod eviction time
Pod eviction time is useful when designing for resiliency in a control plane or
Availability Zone failure scenario. During the Availability Zone failure testing, where the
subnets lose network connectivity, all the impacted Amazon EKS nodes lose connectivity to the Amazon EKS
control planes. Within 1 minute, all the impacted Amazon EKS nodes are marked with the
NotReady
status, and the pod endpoints or EndpointSlices have been removed from
the service endpoints. However, all the pods running on the impacted nodes remain at the
running
status for the default 5 minutes. Then the pods are marked as
TERMINATING
, and new pods are scheduled.
The pod-eviction-timeout
parameter inside the Kubernetes Controller Manager
is set by default at 5 minutes and could be updated through the Kubernetes control plane.
However, because Amazon EKS is a managed Kubernetes service, pod-eviction-timeout
is
not available to be modified.
For a work-around, you can use node taint-based evictionstolerationSeconds
for
node.kubernetes.io/unreachable
and node.kubernetes.io/not-ready
values to each deployment. The following code provides an example:
apiVersion: apps/v1 kind: Deployment metadata: name: busybox namespace: default spec: replicas: 2 selector: matchLabels: app: busybox template: metadata: labels: app: busybox spec: tolerations: - key: "node.kubernetes.io/unreachable" operator: "Exists" effect: "NoExecute" tolerationSeconds: 2 - key: "node.kubernetes.io/not-ready" operator: "Exists" effect: "NoExecute" tolerationSeconds: 2 containers: - image: busybox command: - sleep - "3600" imagePullPolicy: IfNotPresent name: busybox restartPolicy: Always