Amazon Elastic Kubernetes Service
Amazon EKS provides features that enable you to make your applications more resilient to events such as the degraded health or the impairment of an Availability Zone. When you run your workloads in an Amazon EKS cluster, you can further improve your application environment’s fault tolerance and application recovery by using zonal shift or zonal autoshift.
Using zonal shift with Amazon Elastic Kubernetes Service
To enable zonal shift, use one of the following methods. For more information, see Learn about ARC zonal shift in the Amazon Elastic Kubernetes Service User Guide.
You can start a zonal shift for an Amazon EKS cluster, or you can allow AWS to do it for you, by enabling zonal autoshift. After your Amazon EKS cluster zonal shift enabled with ARC, you can start a zonal shift or enable zonal autoshift using the ARC Console, the AWS CLI, or the zonal shift and zonal autoshift APIs.
For more information on starting a zonal shift, see Starting, updating, or canceling a zonal shift.
For more information on enabling Amazon EKS with zonal shift, see Learn about ARC Zonal Shift in Amazon EKS in the Amazon Elastic Kubernetes Service User Guide.
How zonal shift works for Amazon Elastic Kubernetes Service
During an Amazon EKS zonal shift, the following automatically takes place:
All the nodes in the impacted AZ are cordoned. This prevents the Kubernetes Scheduler from scheduling new Pods onto the nodes in the unhealthy AZ.
If you’re using Managed Node Groups, Availability Zone rebalancing is suspended, and your Auto Scaling group is updated to ensure that new Amazon EKS data plane nodes are only launched in healthy AZs.
The nodes in the unhealthy AZ are not terminated and the Pods are not evicted from these nodes. This is to ensure that when a zonal shift expires or is canceled, your traffic can be safely returned to the AZ that still has full capacity.
The EndpointSlice controller finds all the Pod endpoints in the impaired AZ and removes them from the relevant EndpointSlices. This ensures that only Pod endpoints in healthy AZs are targeted to receive network traffic. When a zonal shift is canceled or expires, the EndpointSlice controller updates the EndpointSlices to include the endpoints in the restored AZ.
For more information, see the AWS
Containers blog